A dynamic independent component analysis approach to fault detection with new statistics

This paper presents a fault detection method based on Dynamic Independent Component Analysis (DICA) with new statistics. These new statistics are statistical moments and first characteristic function that surrogate the norm operator to calculate the fault detection statistics to determine the control limit of the independent components (ICs). The estimation of first characteristic function by its series is modified such that the effect of series remainder on estimation is reduced. The advantage of using first characteristic function and moments, over second characteristic function and cumulants, as fault detection statistics is also presented. It is shown that the proposed method can detect a class of faults that the former methods cannot; in particular faults with small amplitude ICs that have either different probability density function or identical probability density function of the ICs, but different low order moments of the ICs compared with the normal performance. Simulation results are used to show the effectiveness of the proposed method.


INTRODUCTION
Many industrial plants encompass a large number of control loops and sensors are employed to measure the variable in each individual loop. The real measured variables are sampled periodically and the collected data is used for investigation the remote monitoring system [1] and fault detection strategies. However, after a short period of time the large data sets are assembled that need to be analyzed, and processing these large data sets in a short time requires powerful computational facilities which may not be available in all industrial site. Consequently, it is crucial to use strategies with low computational cost and fast to extract the important data from such large data sets and the statistical approaches for data analysis and classification can be used to achieve this objective. Require to these approaches to be sensitive, because the number of large scale systems increase by proposed the method of filtering of sensor reading in sensor networks such as [2].
In plants such as chemical industries and nuclear power plants, where faults can lead to biological and human tragedies, fault detection at the early stages, before they develop into major defects, is crucial. Being detected at the early stages, problems can be solved by replacing exhausted components and consequently major failures can be prevented. Moreover, fault detection at the early stages prevents failure extension and hence reduces the need for more costly repairs, improves the system reliability and reduces the system downtime [3, 4, 5, 6, and 7]. However, fault detection and root cause diagnosis are challenging tasks. In most industrial plants, the waste exchange lines are installed to preclude raw material loss. This leads to systems with highly interactive loops; in particular, when the buffer tanks are bypassed to save energy. In such systems, the influence of a fault, occurred in one of the loops, propagates through all the loops and makes it is impossible for the operator to identify the root cause of the fault. Therefore, fault detection and root cause diagnosis at the early stages, when the fault amplitude is small, is of critical importance. Thornhill and Horch reviewed and categorized the suggested methods in plant-wide disturbance detection and diagnosis [8]. Decomposition is one of the proposed strategies which includes three methods: Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Non-Negative Matrix Factorization (NNMF). PCA has been widely used to solve the stated problems. This approach projects the data space onto a lower dimensional subspace such that the correlation between the projected variables is minimized. It also creates some new unrelated variables by which faults root cause can be detected. The PCA method is well adapted to the Gaussian latent variables. It is worth mentioning that in Gaussian distribution, uncorrelation leads to independence. However, Martin and Morris showed that latent variables of most real systems do not follow a Gaussian distribution [9]. The other method to dimension reduction are proposed by Seungdo in [10].
Lee et al. introduced a new statistical process monitoring approach using ICA [11]. In the ICA method, statistical dependence between latent variables is minimized and it is shown that it reveals more useful information from the observation data in comparison with PCA. However, in fault detection methods based on the ICA, the system is assumed to be static. Lee et al. extended the approach and proposed Dynamic ICA (DICA) to overcome this disadvantage [12]. The idea behind this method is adopted from [13,14]. The drawback of this approach is the drastic increase in dimensions as a result of expanding data vector to augmented matrix. Teimoortashloo et al. constructed the new observation matrices to improve the drawbacks of dealing with high dimensional systems, such as reduction in the running time, in the number of arithmetic operations, and in the error of orthogonalization estimation, as well as the overall improvement of capability of fault detection [15]. Another assumption in fault detection methods using ICA/DICA is that the variables do not follow a Gaussian distribution. Ge and Song combined PCA and ICA and proposed a new approach to deal with Gaussian and non-Gaussian latent variable [16]. Ge and Song proposed Maximum-likelihood mixture factor analysis model for process monitoring [17]. Ge et al. Improved kernel PCA-based monitoring approach for nonlinear processes [18]. Brys et al. showed some of the weak points of ICA-based fault detection; one of which is the assumption that the offline part observations have no outliers, although this does not always true [19]. In particular circumstances, if a fault occurs in the offline part, it is assumed as normal performance and hence remains undetected in the online part. Therefore, outlier rejection method is proposed by [20]. They also showed that using square of 2-norm for calculating the control limit, which produces elliptical control limit, is not suitable if ICA/DICA method leads to a joint distribution that the mutual independent components have rectangular-shape. Thus they suggested adjusted outlier (AO) approach that can produce a rectangular-shape control limit.
Ge et al. proposed Batch process monitoring based on the support vector data description method [21]. Miao et al. developed a new nonlinear fault detection method based on locally linear embedding [22]. Ge and Song proposed a distributionfree method for process monitoring [23]. Ge et al. reviewed on data-based process monitoring [24]. Mu-Chen et al. used Durbin Watson index to choose the dominant ICs and support vector data description as fault detection statistics [25]. Huang and Yan proposed DPCA-DICA and Bayesian Inference to put variable into a block on the basis of the variable normality and then used Jarque-Bera test [26]. Zvokelj et al. combined ensemble empirical mode decomposition with ICA to decompose signal to difference time scales [29].
Statistical moments and cumulants have been widely used in signal processing, blind system identification and blind source separation [30]. Statistical moments and cumulants as well as first and second characteristic function, moment generating function, and cumulants generating function are well documented in the literature [30, 31, 32, 33 and 13]. In signal processing, if probability density function (pdf) to be existed and their distribution to be continued, finding the moments can be lead to finding their pdf, and the moments can be calculated from the moment generating function [13]. The characteristic function is also a moment generating function [31]. Utilizing characteristic function is more appropriate than distribution functions as it can be estimated by smaller sample size [34]. Moreover, characteristic function always exists but moment generating function does not necessarily exist [31]. In fact, it is argued that when the sample size is small, analyzing the model by two first moments and a selection of higher order moments yields better results, in terms of stability and robustness, rather than approaches in which the use of full set of higher order moments is suggested [35,36]. While increasing the number of moments leads to better approximation of distribution function, in practice moments higher than the fourth order are rarely used or needed; that is due to the fact that higher-order moments are sensitive to outliers [13].
The main contributions of this paper can be summarized as follows. The new statistics are introduced in order to detect a class of faults with the small amplitude ICs that the former methods fail to detect. These new statistics are the lower statistical moments and first characteristic function that surrogate the 2-norm or AO operators to calculate the control limit of the independent components (ICs). The fault detection occurs when the moments or value of the first characteristic function of ICs, at certain frequency, are out of the normal performance interval. This means that the pdf of ICs or some moments differ from those of the normal performance.

Motivating example
To clarify the benefits of the proposed method over the former ones, see Figure 1. This figure shows the joint distribution of the two IC vectors. The red dots are normal operation ICs and the green squares are faulty ICs. Using 2-norm or AO approach leads to an elliptical control limit or a rectangular-shape control limit, respectively, and neither of them can detect the fault (see black dashed circle or blue dashed rectangular in Figure 1). It is worth mentioning that the fault detection occurs when the 2norm or AO operator of ICs are out of the normal performance interval, control limit. On the other hand, low order moments of ICs detect this fault because the density of ICs increases in the left hand side of joint distribution plot. The rest of the paper is organized as follows. Section two presents a brief review of DICA and fault detection using DICA. New statistics to determine the control limit are introduced in section three, and section four describes the proposed algorithm. Simulation results and conclusion are included in sections five and six, respectively.

6
computer systems science & engineering

FAULT DETECTION
This section provides a brief review of the DICA algorithm and DICA-based fault detection method.

The ICA algorithm
In the ICA algorithm, the idea is to find m ICs (s r r = 1, . . . , m as elements of random vector s(k) ∈ R m×1 ) as source from the m observed variables (x r r = 1, . . . , m as element of random vector x(k) ∈ R m×1 ) that are sampled from the sensors. The number of variables can be bigger than the ICs, and vice versa. In general, in fault detection methods, the number of variable and ICs are assumed to be equal. In this method, N vectors of s(k) ∈ R m×1 and x(k) ∈ R m×1 , respectively, are constructed matrices x ∈ R m×N and S ∈ R m×N (N is number of samples of each variable). The ICA algorithm searches to find the mixing matrix A ∈ R m×m and the ICs matrix S ∈ R m×N , which are unknown matrices, from the known matrix x ∈ R m×N , where The observed variables are typically mean centered. By centering the observed variables, i.e., subtracting their sample mean simplifies the theory and algorithms. It is made in the rest of this manuscript. Equation (1) can be written as S = W.X, where W = A −1 . Matrix x is whitened using Q ∈ R m×m where Q = − 1 2 U T , ∈ R m×m , is a diagonal matrix with the eigenvalues of matrix R x = E(x(k)x T (k)) as its diagonal elements(x(k) is the random vector at sample k with covariance R x where E represents expectations see [11]). U ∈ R m×m is a matrix constructed by eigenvectors of R x ∈ R m×m as its columns. The whitening matrix Z is obtained by: Where B = QA. Whitening the matrix X means that its components to be uncorrelated and their variances to be equaled to unity. In other words, the covariance matrix of X to be equaled the identity matrix. The whitening is a good idea to reduce the complexity of the ICA algorithm because it reduces the number of elements that should be estimated (see [13]). It is easy to show that B −1 = B T . Therefore, the problem of finding a full rank matrix A is reduced to finding an orthogonal matrix B ∈ R m×m . Equation (2) is a matrix equation with two unknown matrices, S and B, which can be rewritten as S = B T Z, with two constrains: first, B ∈ R m×m is orthonormal matrix and second, the statistical dependence between elements of vector s(k) ∈ R m×1 are minimized.
Fast ICA, which employs fixed point optimization algorithm, was used by [11,12] for fault detection is as follows: 1. Choose m and Set counter i = 1 2. Choose a random initial vector b i ∈ R m×1 with unit norm (b i is a i th column of matrix B ∈ R m×m ). To summarize, the matrix B is determined by the abovementioned method. The independent component matrix, s, then is calculated as s = B T Z, and finally the de-mixing matrix, w ∈ R m×m , is calculated as W = B T Q.

Fault detection based on the DICA
The fault detection algorithm proposed by Lee et al. has two parts, in rest of this paper this method is called former method [11,12]. An Offline part in which the normal operating data is obtained by an offline training; and an online part, which an online monitoring is performed to detect deviation from the normal operating data. It is worth mentioning that in fault detection, the main idea of ICA is blind source separation to diagnosis the root cause of fault.

The offline part
The offline part includes the following steps: Step1: The observation matrix X ∈ R m(N+1) is obtained from the assembled data and in order to carry out DICA, the time lag l (proposed as l = 2 by [12]) is chosen and the augmented matrix x L (X L ∈ R (l+1)m×N ) is constructed by augmenting columns vol 33 no 1 January 2018 X(k + 1), . . . , X(k+l) to column X(k) for k = 1, . . . , N. Then the augmented matrix is normalized using the mean and standard deviation of each row.
Step 4: Rearrange and divide matrix W ∈ R m(l+1)×m(l+1) into two parts: the dominant part, W d ∈ R d×m(l+1) , and the excluded part, W e ∈ R (m(l+1)−d×m(l+1)) , i.e. W = W d W e . d is number of rows of dominant part of W (see [11]). Note that the norm of rows of matrix W is used for this division; higher norm construct W d and lower norm construct W e . Rearrange and divide matrix Then the IC matrix space can be decomposed into two subspaces; the dominant part, S d ∈ R d×N , and the excluded part S e ∈ R (m(l+1)−d)×N and can be written as S = S d S e where s d = w d X L and S e = W e X L .
Step 5: Three fault detection statistics are calculated to monitor the process: Step 6: Confidence limits of the three statistics are calculated by KDE (99% or 95%) to determine the control limit. KDE is a technique that appropriately determines the control limits via these statistics, while it simultaneously satisfies the confidence bounds. A univariate kernel estimator with kernel k is defined aŝ where h is a contant value namely window width, x is the considered data point, x i is the observation and k is kernel function that usually choses as Gaussian density function [11].

The online part
The online part consists of the following steps: Step1: For a new observation data vector (new samples of each variable) the augmented vector x newL (k) ∈ R (l+1)m×1 is constructed by augmenting columns X new (k+1), . . . , XX new (k+1) to column X new (k). Then it is normalized by the offline part mean and standard deviation of each row as in step 1 of the offline part.
Step 2: The dominant and excluded parts of ICs are calculated by offline W d and W e as: Step 3: As in step 5 of the offline part three fault detection statistics are calculated as: The systematic excluded part I 2 en (k) = s T en (k)s en (k) The non-systematic part S P E n (k) = e T n (k)e n (k) where e n (k) = If these new statistics are out of the control limit, a fault has occurred.
Step 4: Root causes of the faults are detected by using the contribution value. Contribution value reveals the ICs that most influence the fault detection statistics.

NEW FAULT DETECTION STATISTICS TO DETERMINE THE CONTROL LIMIT
In the former fault detection methods, control limit is a vector norm (2-norm) of ICs which creates a geometrical shape such that to embed the normal values of ICs acquired in the offline part [11]. Operation condition in the online part is then defined in accordance with this shape. The square of 2-norm value of ICs vectors being outside of this shape in the online part indicates that a fault/faults has/have occurred. This means that these methods cannot detect faults if the square of 2-norm of the ICs vectors in the online part is inside the control limit shape, see Figure 1.
In other words, faults that lead to short amplitude ICs remain undetected until they increase to a crucial level and hence the norm of ICs vector exceeds the control limit shape, which can be too late for any further action.
To overcome this drawback, this paper presents a new approach, which the new fault detection statistics are used to define the control limit. In the proposed method, faults will be detected if the pdf, providing that its change can be estimated by low order moments of the first characteristic function, or the low order moments differ from those in the normal operating data.

Characteristic function
Consider a scalar random variable s with the pdf f (s) (usually the pdf is unknown) and there is available, a set of m(l + 1) samples s 1 , . . . , s r . . . , s m(l+1) from s that construct the vector s(k). According to Hyvarinen et al., the first characteristic function can be defined as the continues Fourier transform of pdf [13]: And the second characteristic function can be expressed as follows (see [31]): Where, μ i is the i th moment and κ i is the i th cumulant. In principle, the first characteristic function, φ(ω), is presented as a Taylor series and the second characteritic function, ψ(ω), is presented as a power series in which each term is a Taylor series. μ i is defined as: However, deu to the fact that the pdf is difficult to obtain in practice, the moment are often estimated by the sample mean as [38]: Which s r is the r th sample of scalar random variable s. According to Kendall and Stuart, nevertheless, the cumulants as well as the moments cannot be calculated directly by a summation operator [31]. Thus, it is necessary to, first, calculate the moments and then substituted the calculated moments in Equation (7) to acquire the cumulants. Providing that the mean of s is nonzero, first cumulant (mean), second cumulant (variance), third cumulant (skewness), and fourth cumulant (kurtosis) can be calculated as follows [13]: It is easy to show that if the i th moment of scalar random variable s is shown by μ 1 then the i th moment of scalar random variable ls is calculated by l i μ 1 . Where is a constant. It means that if the scale of samples decreases (l < 1) the higher order moments will decrease more than the lower order moments. While if the i th cumulant of scalar random variable s is shown by κ i then the cumulant of scalar random variable ls is calculated by κ i , too. It means that if the scale of samples decreases the higher order moments will not decrease more than the lower order moments.

Comparison between the first and the second Characteristic function as fault detection statistics
In this paper, the first characteristic function is used as fault detection statistics to determine the control limit. More precisely, the first characteristic function is estimated by the first four terms of the series given in Equation (3), and then it is used to determine a new control limit. The first characteristic function is chosen because its series (providing all terms of series be able to estimated) has an infinite radius of convergence, whereas the radius of convergence of the second characteristic function series is finite (even if all terms of series be able to estimated). Furthermore, the propagation of error in estimation the series terms of second characteristic function is bigger than the error of series terms of first characteristic function. Moreover, the error of series remainder can be reduced by changing the scale of all ICs in first characteristic function, while it is not true in second characteristic function.

Radius of convergence of the first and second characteristic function
According to Equation (3), the first characteristic function is expanded as a Taylor series of exponential function. The radius of convergence in exponential function is infinite. While according to Equation (4), the second characteristic function is expanded as a power series of the logarithm function in which each term is a Taylor series with infinite convergence radius. As a result, the second characteristic function can be written as: Since μ 0 = 1, see [31]. It converges if Inequality (9) shows that Equation (8) may diverge for certain values of s and ω.

The propagation of error
Propagation of error is inevitable in indirect measurement. According to Equation (7), cumulants are calculated by using the moments and hence the cumulant computation suffers from propagation of error. In Appendix A shows that propagation of error in estimation the cumulants is bigger than the error of moments. Thus, estimation of the first characteristic function has smaller error in comparison with the second characteristic function due to propagation of error. Furthermore, Appendix A shows the moments can be estimated more precisely in comparison with cumulants and hence, the moments are preferable.

Reduce the error of series remainder in first characteristic function
The error of series remainder can be reduced by changing the scale of all ICs in first characteristic function, while it is not true in second characteristic function. In the rest of this section the proposed method to reduce the error of series remainder is explained.
Estimation of the first characteristic function series only by its first four moments introduces a new type of error which is a direct result of the remainder, sum of the terms of order higher than four, of the series. Equation (3) can be rewritten as: Where R(ω) is the remainder of series. It is shown in the Appendix B that 0.125ω 4 s 4 max is a supremum for the summation of high order moments. Therefore changing the scale of all ICs matrix elements to values smaller than one can be used to reduce the remainder of first characteristic function (It is worth mentioning that the moments are affected by changing the scale Where a = 1 λs max offline (12) λ > 1 is a real value. Changing the scale result in s max new < 1 and hence decreases the value of 0.125ω 4 s 4 max new as quartic. Consequently, remainder (error) decreases as quartic. However, it can be seen in the last paragraph of section 3.1, that decline in lower order moments is smaller than that of higher order moments (decline in the first, second, and third moments are proportional, quadratic, and cubic, respectively). Thus the first characteristic function is more influenced by the lower order moments as the scale declines.
It should be mentioned that choosing a large λ, helps the constraints of inequalities (B-3) and (B-4) to be satisfied. But choosing a large λ yields to an increase in calculation error, round-off error, and a decline in the weights of the third and fourth moments in characteristic function estimation. Choosing a large ω, on the other hand, results in a dramatic increases in the supremum magnitude. It may also cause the constraints of inequalities (B-3) and (B-4) not to be satisfied. But choosing a small ω increases the calculation error, round-off error and decreases the weights of the third and fourth moments in characteristic function estimation. In practice, fault frequency, if known, would be a reasonable choice for ω and then λ can be chosen such that: The inequality (13) is a result of substituting Equation (12) in Equation (11) and satisfying the constraints of inequalities (B-3) and (B-4). Notice that choosing a λ much bigger than ω increase the round of error. Therefore, λ should be chosen close to ω.

THE PROPOSED ALGORITHM
In this section the proposed fault detection algorithm is discussed.

The offline part
Step1: The observation matrix X ∈ R m×(N+l) is obtained by the observed data. Then, the time lag l is chosen and the augmented matrix X L (X L ∈ R (l+1)m×N ) is constructed by augmenting columns X (k + 1), . . . , X (x + l) to column X (k) for k = 1, . . . , N. Then the augmented matrix normalized by the mean and standard deviation of each row.
Step5: Rearrange and divide matrix W ∈ R m(l+1)×m(l+1) into two parts: the dominant part, W d ∈ R d×m(l+1) , and the excluded part, W e ∈ R (m(l+1)−d)×m(l+1) , i.e. W = W d W e . d is number of rows of dominant part of W (see [11]). Note that the norm of rows of matrix W is used for this division; higher norm construct W d and lower norm construct W e . Rearrange and divide matrix B ∈ R m(l+1)×m(l+1) into two matrices B d ∈ R m(l+1)×d and B e ∈ R m(l+1)×(m(l+1)−d) as: Then the IC matrix space can be decomposed into two subspaces; the dominant part, S New d ∈ R d×N , and the excluded part Step6: chose k = 1 and evaluate the following fifteen fault detection statistics to monitor the process: Calculate the first, second, third, and fourth moment statistics and the first characteristic function statistic at frequency ω 0 for the elements of ICs vector s New d (k), I φ dω 0 (k) and I μ id (k) i = 1, 2, 3, 4, as follows: Where Which s Newd (k), is the r th element of vector s Newd (k) Calculate the first, second, third, and fourth moment statistics and first characteristic function statistic at frequency ω 0 for the elements of ICs vector s Newe (k), I φ eω 0 (k) and I μ i e (k) i=1,2,3,4, as follows: Where Which s Newe (k) r is the r th element of vector s Newe (k). Similar to the aforementioned procedure, the first four moment statistics and the first characteristic functions statistic at frequency ω 0 for the elements of vector e(k), I φ nsω 0 (k) and I μ ins (k) i = 1, 2, 3, 4, need to be calculated (For the nonsystematic part) as: Where and e(k) r is the r th element of vector e(k). if k < N set k = k + 1 and go back to step 6.
Step7: Calculate confidence limits of the fifteen fault detection statistics by KDE (99% or 95%) to acquire the control limit.

The online part
Step1: Construct the new augmented observation vector and it is normalized by the offline part mean and standard deviation of each row as in step 1 of the offline part to obtain x online (k).
Step2: Compute vectors s online (k) and s eonline (k) by prior W d and W e as: Step3: Compute the fifteen fault detection statistics as in step 6 of the offline part for the new vector of ICs as: • Calculate the first, second, third, and fourth moment statistics and the first characteristic function statistic at frequency ω 0 for the elements of ICs vector s don (k), I φ ondω 0 and I μ i ond (k) i = 1, 2, 3, 4, as follows: Where Which s don (k) r is the r th element of vector s don (k).
• Calculate the first, second, third, and fourth moment statistics and first characteristic function statistic at frequency for the elements of ICs vector s eon (k), I φ oneω 0 (k) and I μ ione (k) i = 1, . . . , 4 Where Which s eon (k) r is the r th element of vector s eon (k).
• Similar to the aforementioned procedure, the first four moment statistics and the first characteristic functions statistic at frequency ω 0 for the elements of vector e on (k), I φ onms ω 0 (k) and I μ i omns (k) i = 1, 2, 3, 4, need to be calculated (For the non-systematic part) as: Where where e on (k) = x online (k) −x online (k) andx online (k) = Q −1 B d W d x online (k) and e on (k) r is the r th element of vector e on (k).
If these new fault detection statistics are out of the control limit, a fault has occurred.
Step4: Detect the root cause of the fault by calculating contribution's values of each sample in fault detection statistics.

Root cause diagnosis
Root cause diagnosis is as important as fault detection and needs to be done before the fault spreads through all loops and develops to a crucial level. In the ICA-based fault detection methods, the dependence between the ICs, in each vector, is minimized. As vol 33 no 1 January 2018 a result, the ICs which lead to increase in the fault detection statistics can be identified. In other words, contribution of the ICs in the value of statistics, by which the fault is being detected, can be determined which ICs is root cause of the fault. In other words: 1. If the first moments detect the fault in sample k, the ICs with bigger absolute value in sample is the root cause of the fault.
2. If the second moments detect the fault in sample k, the ICs that have bigger squared value in sample k is the root cause of the fault.
3. If the third moments detect the fault in sample k, the ICs with bigger cubed value in sample k is the root cause of the fault.
4. If the forth moments detects the fault in sample k, the ICs with bigger ICs to the power of four value in sample k is the root cause of the fault.

If the first characteristic function detects the fault in sample
k, the ICs sample that have bigger value in Equations (18) or (22) in sample k is root cause of the fault.
Notice that all absolutely integrable functions of two random variables became independent if these two random variables are independent, also expectation have linearity property.
Next step, after finding the ICs that is the root cause of the fault, is to find the loop which made the ICs. It is of crucial importance to find the loop without disturbing the system normal operation. Knowledge of the system properties is the key to answer this question. For example, the loop can be found if the operator can detect a characteristic, such as frequency, that shows which loop made which ICs in the offline part.

Analysis of the simultaneous application of the moments and the first characteristic function
The first characteristic function statistic is not sufficient for fault detection because: • The faults will be detected when the first characteristic function statistic in the online part differs from that of the offline part. In certain circumstances, however, increase in some terms and decrease in other terms of the first characteristic function cancel out each other's effect on the first characteristic function and hence the first characteristic function statistic does not change significantly. As a result, the faults cannot be detected.
• The first characteristic function statistic is a function of frequency ω. Hence, the first characteristic function statistic changes if the value of ω changes. In other words, two completely different Fourier transform functions can have identical values in some frequencies. Thus, faults will not be detected when, at one chosen frequency, the first characteristic function in the online part is equal to that of the offline part.
The first four moment statistics are not sufficient for fault detection because: 1. The moments-based fault detection methods fail to detect the occurrence of the faults if all four moments are inside the control limit. To improve the effectiveness of the fault detection methods in such situations, the first characteristic function along with the first four moments can be used as statistics to determine the control limit. The reason is that, according to Equations (18) and (22), the first characteristic function will not reach its maximum if all four first moments are at their maximum values. Therefore, the maximum value of the first characteristic function in the offline part is defined as new constraint to determine the control limit.

Limitation of the proposed method
While the proposed method is powerful in detecting a class of faults which the former methods cannot detect, it has two limitations: 2. If the faulty ICs vector in the online part have elements with values greater than λs max o f f line , it is possible that constraints of inequalities (B-3) and (B-4) not being satisfied. As a result, the effects of higher order moments overweight the effects of lower order moments in first characteristic function estimation and hence fault detection may fail. In such circumstances, the effectiveness of the method can be improved by choosing a small frequency ω 0 . However, that leads to small elements in the ICs vector and therefore reduces the calculation accuracy.
3. Applying the proposed method in a short time requires more powerful computational facilities.

SIMULATION RESULTS
The former and proposed fault detection methods are applied to two systems and the simulation results are used to validate the algorithm.

The simple system
The proposed fault detection method based on DICA with new statistics, described in section 4, is applied to the system suggested by [39] and the simulation results are used to validate the algorithm.
The system is comprised of two interfering loops and hence faults propagation is inevitable. Each loop consists of a plant and a PI controller. To model the sensor fault, a block with a limited time output is added to the lower loop. In DICA-based fault detection methods, oscillations that occur in both offline and online parts cannot be detected by fault detection statistics. It, in basically, means that faults that occur in the offline part, outliers, cannot be detected in the online part because it is assumed as normal condition. To show such a situation, two sinusoidal signals are added to the feedback signals. Both signals are of amplitude

Performance of first characteristic function and its terms in comparison with square of 2-norm as statistics to determine the control limit
In this section performance of the proposed method is compared with that of the former method in which square of 2-norm is used to determine the control limit. In Figure 2, parts a, b, c, and d show, respectively, the first, second, third and fourth moment statistics of the ICs and Part e shows the first characteristic function statistic of the ICs. The values of ω 0 and λ are chosen as 1 and 10, respectively. Part f presents the square of 2-norm of ICs as the former statistic. Parts g and h are two control signals sampled from system and Part i shows the joint distribution of the dominant ICs in the online part. In this part the normal operation ICs are shown by red dots and the faulty ICs are shown by green squares. Fault(s) will be detected if amplitude of each statistic exceeds the value of corresponding statistic in the normal condition that made control limit. It means fault/faults will be detected if some samples of either of statistics, the first four moment statistics and the first characteristic function statistic in the proposed method, or square of 2-norm in the former method, are bigger than the same statistics in the offline part obtained from normal operation condition. A dashed line represents the maximum value of statistics in the online part. The first, second and the fourth moment statistics, Part a, b and d, although not easily, detect the sensor fault but fail to detect the actuator fault. Whereas, the third moment statistic, Part c, detects both sensor and actuator faults. Also, the first characteristic function statistic only detects the sensor fault, Part e. On the other hand, it is clear that the former method detects none of those faults, see Part f . Part i clearly shows that the faulty ICs, green squares, are located among the normal ICs shown by red dots. Also it is noticeable that the faults do not have a significant influence on the first and second control signals; see Parts g and h. In other words, the faults cannot be detected by monitoring the signals that are not processed by the proposed fault detection method. The Simulation results emphasis that the proposed method is a more effective fault detection method than the former method.

The root cause diagnosis
Simulation results are used to verify the ability of the proposed method in root cause diagnosis. For better visibility, the amplitudes of both faults are increased by 200%, the actuator fault is shifted by 10 seconds. In other words, the actuator fault is assumed to occur between 180 and 185 seconds which correspond to samples from 1600 to 1700. The results are shown in Figure 3. In this Figure,

Comparison between the first and second characteristic functions
In this section the effects of using the first characteristic function statistic and the first four moment statistics on the performance of the fault detection algorithm is compared with that of the second characteristic function statistic and the first four cumulant statistics. Absolute value of 1 − |ψ ω 0 (k)| and absolute value of first four cumulants are, respectively, called second characteristic function statistic and cumulant statistics, see Equations (15) and (16). The amplitude of the sensor fault and the actuator fault are, respectively, decreased to 0.001 of the tolerance amplitude of the feedback signal and to 0.0003 of the control signal. Also, λ is set to 2. In Figure 4, Parts a, c, e, and g are, respectively, the first, second, third, and fourth moment statistics of ICs and Part i presents the first characteristic function statistic of the ICs. Parts b, d, f , and h are absolute values of the first, second, third and fourth cumulant of ICs, respectively and Part j shows the second characteristic function of ICs.
Parts a and b are identical. However, in Part c the amplitude of the faulty statistics is slightly bigger than the statistics in Part d and exceeds the corresponding statistics maximum in the offline part (dashed line). Comparing Parts e and g with f and h, respectively, shows that the difference in the amplitude of faulty statistics becomes more obvious as the order of moments and cumulants increases. It is obvious that by using the moments as the statistics to determine the control limits, fault detection method is able to identify both faults, whereas by using the cumulant as the statistic to determine the control limit, the fault detection method only detects the sensor fault. Thus it is obvious that detecting faults by moments yields much better result than by cumulants. Moreover, comparison between Parts i and j reveals the benefit of using the first characteristic function over the second characteristic function.

5.1.4
Effect of frequency ω 0 and λ on the first characteristic function statistic to determine the control limit In this section, the effect of frequency ω 0 and λ on the performance of first characteristic function statistics is investigated. In  Figure 5 clearly illustrates that the first characteristic function statistic values change when the faults occur. It can be seen, moving in each row from left to right in Figure  5, that with a fixed λ, choosing a bigger ω 0 leads to a bigger change in amplitude of first characteristic function statistics and hence a smaller round of error. Also Figure 5, parts c, d and h, shows that choosing λ and ω 0 in a way that inequality ω 0 < λ is not satisfied, yield to first characteristic function estimation with amplitude greater than 1 which is an invalid estimation because On the other hand, moving upward in each column of figure 5, choosing a smaller λ while keeping the ω 0 fixed, results in a bigger change in amplitude of the first characteristic function statistics and thus a decrease in the round of error. A brief review of Figure 5 emphasizes that choosing λ bigger but close to ω 0 , leads to more observable changes in the amplitude of first characteristic function statistics and hence smaller round of error.
In summary, first ω 0 need to be selected, not too large value to lead to a decrease in the value of moments statistics as λ should be a supremum for ω 0 . Then λ should be chosen bigger than but close to ω 0 to decrease the round of error.

The Couple tank system
A simple nonlinear model of the coupled tank is provided in Ko et al. as [40]: Where H 1 , H 2 are the height of fluid in tank 1 and 2, respectively. Q i1 and Q i2 are the pump flow rate into tank 1 and 2, respectively. α 1 , α 2 and α 3 are constants. A 1 , A 2 are the cross sectional area of tank 1 and 2, respectively. Equations (42) and (43) make a system with two interfering loops, thus the fault propagates through the loops. In order to decrease the error, a lag controller is designed for each loop; G C1 = s+0.9 s+0.002 and G C2 = s+0.8 s+0.001 for tank 1 and 2, respectively. Two sinusoidal signals are added to output signals in order to model the waves on the surface. The amplitude of the sinusoidal signal is 0.03 and its frequency is 0.2Hz in the control loop of tank 1. The amplitude of sinusoidal signal in the control loop of tank 2 is 0.8 and its frequency is 0.5Hz. Also, a white noise with the power of 0.0002 is added to each loop.
The simulation runs for 200 seconds with a sample time of 0.05 which means 4000 samples (200/0.05) are accumulated.

Comparison Results
To   Figure 7, parts a, b, c, and d show, respectively, the first, second, third and fourth moment statistics of the ICs and Part e shows the first characteristic function statistic of the ICs. The values of ω 0 and λ are chosen as 1 and 2, respectively.
The first moment statistic, Part a, only detects the first fault. The second and the fourth moment statistics, Parts b and d, detect the first and second faults. Whereas, the first characteristic function statistic and third moment statistic detect all three sen-  This simulation is run 30 times without any changes in the parameters and the proposed method detects 84 faults from 90 faults that occurred, whereas the former method detects 76 faults from 90 faults that occurred.
The simulation results emphasis that the proposed method is a more effective fault detection method than the former method.

5.2.2
The root cause diagnosis Simulation results are used to verify the ability of the proposed method in root cause diagnosis. For better visibility, the amplitudes of all faults are increased by 250%. The results are shown in Figure 8. In this Figure, Parts a, b, c, d and e are, respectively, the first four moments statistics and the first characteristic function statistic of the ICs, and all of them detected the three faults. Part f Basically if it can be identified that tank2 has oscillations with higher frequency in the offline part, one can detect the root cause as the graph in part h has higher frequency.

CONCLUSIONS
This paper has introduced a fault detection method using the DICA with new statistics to determine the control limit. The benefit of the proposed algorithm is the increase in fault detection ability by estimating the first characteristic function and four first moments to determine the control limit indices. It has shown that the proposed method can detect a class of faults that their pdf, providing that its change can be estimated by low order moments of the first characteristic function, or lower moments are different from those of the normal ICs. In contrast to the ICA-based fault detection methods which use square of 2-norm as statistic to determine the control limit, the proposed approach can efficiently detect faulty samples although they are among normal samples. Simulation results are used to verify the effectiveness of the proposed methodology.

Appendix A
Propagation of error is calculated by the method described in [41,42]. A common formula among engineers and experimental scientists to calculate propagation of error, the variance formula (see [43]) is Equation (A-1). For a function of i uncorrelated quan-tities, κ = f (μ 1 , μ 2 , . . . , μ i ), or by neglecting the coloration between quantities the propagation of error can be calculated as: Where κ i is propagation of error in i th cumulant (κ i ) and μ i is error in i th moment (μ i ) ( μ i consist of the error due to calculation of moment and the error due to measurement of signals), providing μ i and μ j i = j are uncorrelated.
The Equation (7) shows that in calculation of κ i , always the first term of right hand side of equation is μ i . Therefore in Equation (A-1) the term under root is always begin with ∂ f ∂μ i μ i 2 and the other terms are nonnegative. Hence Where a 2 is positive term. Therefore κ i ≥ μ i .
This shows that propagation of errors in cumulants are bigger than the errors of moment when i > 1. It is worth mentioning that there is not any propagation of error in estimation of moments because they are computed directly.