Estimating System State through Similarity Analysis of Signal Patterns

State prediction is not straightforward, particularly for complex systems that cannot provide sufficient amounts of training data. In particular, it is usually difficult to analyze some signal patterns for state prediction if they were observed in both normal and fault-states with a similar frequency or if they were rarely observed in any system state. In order to estimate the system status with imbalanced state data characterized insufficient fault occurrences, this paper proposes a state prediction method that employs discrete state vectors (DSVs) for pattern extraction and then applies a naïve Bayes classifier and Brier scores to interpolate untrained pattern information by using the trained ones probabilistically. Each Brier score is transformed into a more intuitive one, termed state prediction power (SPP). The SPP values represent the reliability of the system state prediction. A state prediction power map, which visualizes the DSVs and corresponding SPP values, is provided a more intuitive way of state prediction analysis. A case study using a car engine fault simulator was conducted to generate artificial engine knocking. The proposed method was evaluated using holdout cross-validation, defining specificity and sensitivity as indicators to represent state prediction success rates for no-fault and fault states, respectively. The results show that specificity and sensitivity are very high (equal to 1) for high limit values of SPP, but drop off dramatically for lower limit values.


Introduction
In recent years, products and industrial systems have increased in complexity due to the implementation of various technologies for meeting diverse demands. The possibility of unexpected faults has also increased, potentially resulting in loss of brand value. Hence, a number of maintenance strategies have been introduced for assuring system reliability [1]. Among these strategies, condition-based maintenance (CBM) has been widely applied in industry owing to its low-cost advantage over conventional maintenance strategies [2]. CBM reduces costs by conducting maintenance tasks only when abnormal system behavior is detected. Detection of abnormal behaviors is carried out using information extracted through condition monitoring. In general, condition monitoring involves various methods for extracting the relationship between sensor data and the state of the physical system [3]. Model-based fault detection approaches aim to describe the actual status of a system by mathematical system equations. Furthermore, several graphical formal models such as automata, petri net, and bond graph have been developed to predict faults in electro-mechanical systems [4]. To do this, collected operational data are analyzed as a directed graph where nodes indicate increases the number of patterns that cannot be trained, since the complexity of the data is increased. For instance, when patterns extracted from different states of the system have the same distance from an unknown pattern, distance-based classifiers cannot clearly determine the system state.
Inspired by this weakness of distance-based classifiers, this paper proposes a new pattern classifier that can robustly classify the system states with multivariate data, even though the given pattern is not trained. To do so, the proposed method applies a probabilistic approach and scoring method to supplement information about the untrained pattern. We call this the "state prediction method" since this paper aims to estimate the state of an electromechanical system using multivariate data. In the remainder of this paper, the detailed procedure of the proposed method is presented with a case study to evaluate its performance.

State Prediction Method
This section presents the three-step procedure of system state prediction as shown in Figure 1. First, raw sensor signals are conditioned and filtered to reduce measurement noise. PCA-based dimension reduction for multi-sensor signals is conducted. In some cases of data-driven fault diagnostics, a very limited amount of fault occurrences due to the recent high reliable systems have hindered significant pattern identification despite a large amount of collected sensor data, a so-called data imbalance problem. For this case, it is sometimes necessary to utilize every available sensor information for pattern mining. Among these sensors, more informative sensors or principal components can be extracted by dimension reduction techniques.
Sensors 2020, 20, x FOR PEER REVIEW 3 of 13 increases the number of patterns that cannot be trained, since the complexity of the data is increased. For instance, when patterns extracted from different states of the system have the same distance from an unknown pattern, distance-based classifiers cannot clearly determine the system state. Inspired by this weakness of distance-based classifiers, this paper proposes a new pattern classifier that can robustly classify the system states with multivariate data, even though the given pattern is not trained. To do so, the proposed method applies a probabilistic approach and scoring method to supplement information about the untrained pattern. We call this the "state prediction method" since this paper aims to estimate the state of an electromechanical system using multivariate data. In the remainder of this paper, the detailed procedure of the proposed method is presented with a case study to evaluate its performance.

State Prediction Method
This section presents the three-step procedure of system state prediction as shown in Figure 1. First, raw sensor signals are conditioned and filtered to reduce measurement noise. PCA-based dimension reduction for multi-sensor signals is conducted. In some cases of data-driven fault diagnostics, a very limited amount of fault occurrences due to the recent high reliable systems have hindered significant pattern identification despite a large amount of collected sensor data, a so-called data imbalance problem. For this case, it is sometimes necessary to utilize every available sensor information for pattern mining. Among these sensors, more informative sensors or principal components can be extracted by dimension reduction techniques. Second, discretization is made by time segmentation and value range partitioning for the shortlisted sensors or chosen Principal Components (PCs). DSV is then obtained in such a way that it describes the state of a system in a specific time segment. Fault patterns are extracted by analyzing the DSVs. Finally, the state prediction power of each pattern is evaluated based on naïve Bayes approximation and probabilistic scoring rule.

Pattern Definition Using Discretized State Vector
Before introducing the proposed method in detail, it is necessary to define the characteristics of the pattern used. The concept of DSVs from previous research [19] is applied to pattern extraction from multi-sensor signals owing to its simple representation of multivariate time series data, as shown in Figure 2. Assume that m sensor signals are acquired from a target system, from 1 to n measurements (i.e., the length of time series of each sensor signal), then it can be written as a matrix , and its sizes of row and column are m and n, respectively. Time series segmentation with a fixed length of time window, i.e., w measurements, is made for data discretization as illustrated by yellow vertical lines in Figure 2a. Second, discretization is made by time segmentation and value range partitioning for the shortlisted sensors or chosen Principal Components (PCs). DSV is then obtained in such a way that it describes the state of a system in a specific time segment. Fault patterns are extracted by analyzing the DSVs. Finally, the state prediction power of each pattern is evaluated based on naïve Bayes approximation and probabilistic scoring rule.

Pattern Definition Using Discretized State Vector
Before introducing the proposed method in detail, it is necessary to define the characteristics of the pattern used. The concept of DSVs from previous research [19] is applied to pattern extraction from multi-sensor signals owing to its simple representation of multivariate time series data, as shown in Figure 2. Assume that m sensor signals are acquired from a target system, from 1 to n measurements (i.e., the length of time series of each sensor signal), then it can be written as a matrix X, and its sizes of row and column are m and n, respectively. Time series segmentation with a fixed length of time Sensors 2020, 20, 6839 4 of 13 window, i.e., w measurements, is made for data discretization as illustrated by yellow vertical lines in Figure 2a.
Sensors 2020, 20, x FOR PEER REVIEW 4 of 13 Figure 2. Transformation of multi-sensor signals to a series of discretized state vectors [19].
Sensor values are subdivided into a predefined number k of bins for data digitization. For example, three bins are specified by two cut-points, red dotted lines in Figure 2a. In this study, the equal-frequency binning method was applied for cut-point determination as follows: first, every sensor data is sorted in ascending order, and the cut-points are then determined so that the number of measurements included in each bin is identical [22]. The appropriate labels are finally allocated to each bin by considering the value of each feature in the sensor signal. For example, is the label for the second bin of Sensor 1 in Figure 2b. The average value of w measurements in a time segment is used for label specification. Figure 2c shows a series of 20 DSVs for two sensor signals. In other words, a DSV is a vector consisting of all the sensor labels in a time segment.
In this way, the time series for the multivariate data is transformed into a set of DSVs. The DSV is regarded as the pattern for the purposes of the proposed method. It is important to note that DSVs that only occur in the no-fault state are defined as no-fault patterns, while DSVs that only occur in fault states of the system are defined as fault patterns.

Probabilistic Scoring Rule for State Prediction
Using the construction of DSVs outlined above, the proposed state prediction method aims to classify the system state when the DSVs are trained with a small amount of data that is insufficient to account for all the behaviors of the system. The state prediction method is designed to compensate for the lack of prior information by probabilistically calculating a score that represents the relationship between DSVs and given states of the system [23]. Each sensor's characteristics are assumed to be independent of all the others, since the sensors are physically isolated from each other. Under this assumption, a naïve Bayes classifier is applied that considers the independence of the elements to be classified. Naïve Bayes classifiers have been applied to state prediction for mechanical systems such as centrifugal pumps [24] and three-phase induction motors [25].
The first step in applying a naïve Bayes classifier with DSVs is calculating the prior information of each state from the training data. Consider a system consisting of n sensors. Let N and F be the sets of no-fault and fault-state DSVs, respectively; let , , … , , be the mth DSV of the given system, while , is the determined label for the mth time segment in the ith sensors data (i 1, 2, … , ). and represent the probabilities of a no-fault and a fault pattern, respectively. When the probabilities are not given, they are estimated by calculating the ratio of observations of each state Figure 2. Transformation of multi-sensor signals to a series of discretized state vectors [19].
Sensor values are subdivided into a predefined number k of bins for data digitization. For example, three bins are specified by two cut-points, red dotted lines in Figure 2a. In this study, the equal-frequency binning method was applied for cut-point determination as follows: first, every sensor data is sorted in ascending order, and the cut-points are then determined so that the number of measurements included in each bin is identical [22]. The appropriate labels are finally allocated to each bin by considering the value of each feature in the sensor signal. For example, l 12 is the label for the second bin of Sensor 1 in Figure 2b. The average value of w measurements in a time segment is used for label specification. Figure 2c shows a series of 20 DSVs for two sensor signals. In other words, a DSV is a vector consisting of all the sensor labels in a time segment.
In this way, the time series for the multivariate data is transformed into a set of DSVs. The DSV is regarded as the pattern for the purposes of the proposed method. It is important to note that DSVs that only occur in the no-fault state are defined as no-fault patterns, while DSVs that only occur in fault states of the system are defined as fault patterns.

Probabilistic Scoring Rule for State Prediction
Using the construction of DSVs outlined above, the proposed state prediction method aims to classify the system state when the DSVs are trained with a small amount of data that is insufficient to account for all the behaviors of the system. The state prediction method is designed to compensate for the lack of prior information by probabilistically calculating a score that represents the relationship between DSVs and given states of the system [23]. Each sensor's characteristics are assumed to be independent of all the others, since the sensors are physically isolated from each other. Under this assumption, a naïve Bayes classifier is applied that considers the independence of the elements to be classified. Naïve Bayes classifiers have been applied to state prediction for mechanical systems such as centrifugal pumps [24] and three-phase induction motors [25].
The first step in applying a naïve Bayes classifier with DSVs is calculating the prior information of each state from the training data. Consider a system consisting of n sensors. Let N and F be the sets of no-fault and fault-state DSVs, respectively; let V m = l 1m 1 , l 2m 2 , . . . , l n,m n T be the mth DSV of Sensors 2020, 20, 6839 5 of 13 the given system, while l i,m i is the determined label for the mth time segment in the ith sensors data (i = 1, 2, . . . , n). P(N) and P(F) represent the probabilities of a no-fault and a fault pattern, respectively. When the probabilities are not given, they are estimated by calculating the ratio of observations of each state to the total number of observations. For example, if the ratios of the fault and no-fault states of the given data are 70% and 30%, the probabilities are assumed to be 0.7, and 0.3, respectively.
Next, conditional probabilities of fault and no-fault states given observed DSVs are calculated for the naïve Bayes classifier. P(F|V m ) is the conditional probability of a fault state given V m , while P(V m |F) is the conditional probability of V m given a fault state. According to Bayes' theorem, where P(N|V m ) is defined as the remainder of subtracting P(F|V m ) from 1: P(V m ∨ F) is calculated by multiplying the conditional probabilities of each label in V m , given a fault state: In addition, Laplace smoothing is applied to prevent the conditional probability from becoming zero before calculating the likelihoods of the DSVs. The probabilities are calculated by assuming the occurrence of the unobserved DSVs to have probabilities summing to 1. Figure 3 shows an example of calculating the conditional probabilities based on the observations as well as the state prediction procedure using these probabilities. The dataset consists of two sensors, and each sensor's data are discretized with three labels. Therefore, nine DSVs in total can be generated and their corresponding conditional probabilities are obtained using Equations (1)-(3).
The states of the four test DSVs, named Test 1 , Test 2 , Test 3 , and Test 4 , are classified using the obtained probabilities. Test 1 and Test 4 are classified as fault states, while Test 2 and Test 3 are classified as no-fault states. Although Test 3 is classified as no-fault, there is no significant difference between P(N|V 8 ), and P(F|V 8 ). While the observation table shows that V 8 only appears in the no-fault state, the appearance of each label or discretized component made the probabilities of both states similar. This is caused by taking into account the independence of each label when calculating the conditional probabilities. Hence, the probabilities are more influenced by state bias of each label in the DSVs than by state bias of the DSVs themselves.
Therefore, our state prediction rule needs to consider the error between the prediction and the actual state. The concept of a probabilistic scoring rule has been applied to distortion-free calculation of this error based on probabilistic assurance. Among the various scoring methods, the Brier score has been adopted in much research in order to emphasize the relationship between the system state and each DSV [26]. The sum of the squared prediction errors for the entire set of predictions is calculated, while the actual state is known [27]. The definition of the Brier score is as follows: where T represents the total number of prediction events, p i denotes the probability of the ith prediction, and δ i describes the actual event corresponding to the ith prediction. The value of δ i is set to 1 for the event predicted by p i , and 0 otherwise [28]. The original scoring rule, which is negatively oriented, is given a positive orientation by subtracting the Brier Score from 1 so as to allow more intuitive insight [29].
zero before calculating the likelihoods of the DSVs. The probabilities are calculated by assuming the occurrence of the unobserved DSVs to have probabilities summing to 1. Figure 3 shows an example of calculating the conditional probabilities based on the observations as well as the state prediction procedure using these probabilities. The dataset consists of two sensors, and each sensor's data are discretized with three labels. Therefore, nine DSVs in total can be generated and their corresponding conditional probabilities are obtained using Equations (1)-(3).  As shown in Figure 4, the Brier score of a DSV that correctly predicts the state is proportional to the conditional probability of the DSV given that state. On the other hand, the Brier score of a DSV that incorrectly predicts the state is inversely proportional to the conditional probability of the DSV given that state. The Brier scores of DSVs are thus converted into state prediction power (SPP) values, defined for more intuitive decision making. SPP is defined in the following Equations: where SPP values range from −1 to 1. The state decision is accomplished by comparing the SPP of the given DSV with a predefined SPP limit value. The limit values for SPP no− f ault and SPP f ault are predefined so as to have the same absolute values but opposite signs. For example, if the limit value for SPP f ault is defined as −0.6, that of SPP no− f ault is defined as 0.6. The state prediction of the given DSV is considered to be reliable only if SPP f ault is lower than −0.6 or SPP no− f ault is higher than 0.6. Figure 4 shows the SPP values that are calculated from the Brier scores of the DSVs. Among the DSVs, V 8 is determined as a fault DSV when the value of SPP is compared with the limit value, though it was determined as an ambiguous DSV based on the naïve Bayes classifier. However, V 3 , and V 7 are determined as ambiguous DSVs when comparing their SPP values, even though they were determined as fault and no-fault DSVs, respectively, with the naïve Bayes classifier.
After obtaining the SPPs of all the DSVs, a visualization of the state prediction, termed the state prediction power map, is suggested as Figure 5a. The map is rearranged according to the SPP values for the DSVs as Figure 5b, providing more intuitive information. The no-fault and fault state DSVs that are determined as reliable are marked with O and X, respectively.
where SPP values range from −1 to 1. The state decision is accomplished by comparing the SPP of the given DSV with a predefined SPP limit value. The limit values for and are predefined so as to have the same absolute values but opposite signs. For example, if the limit value for is defined as −0.6, that of is defined as 0.6. The state prediction of the given DSV is considered to be reliable only if is lower than −0.6 or is higher than 0.6.  Sensors 2020, 20, x FOR PEER REVIEW 7 of 13 Figure 4 shows the SPP values that are calculated from the Brier scores of the DSVs. Among the DSVs, is determined as a fault DSV when the value of SPP is compared with the limit value, though it was determined as an ambiguous DSV based on the naïve Bayes classifier. However, , and are determined as ambiguous DSVs when comparing their SPP values, even though they were determined as fault and no-fault DSVs, respectively, with the naïve Bayes classifier.
After obtaining the SPPs of all the DSVs, a visualization of the state prediction, termed the state prediction power map, is suggested as Figure 5a. The map is rearranged according to the SPP values for the DSVs as Figure 5b, providing more intuitive information. The no-fault and fault state DSVs that are determined as reliable are marked with O and X, respectively.

Engine Fault Simulator
The proposed state prediction method was applied to data collected by a car engine fault simulator. The simulator replicates the occurrence of faults in a car engine by changing sensor values that strongly affect the operating status of the engine. As shown in Figure 6a, no-fault and fault system states are artificially generated by changing the following sensor values: manifold air pressure (MAP), throttle position sensor (TPS), intake air temperature (IAT), water temperature sensor (WTS), and four injectors. Figure 6b shows the data collection system, which consists of 40 sensors, and the data acquisition module (NI-DAQ). Sensor data were collected at a sampling rate of 10 Hz with accuracy of 0.04% gain error in room temperature, timing accuracy of 50 ppm of sample rate, and timing resolution of 12.5 ns. A single analog-to-digital converter of NI-DAQ samples multiple sensor signals through scanning, buffering, and conditioning processes to minimize measurement noise such as missing signal and abnormal peaks.

Engine Fault Simulator
The proposed state prediction method was applied to data collected by a car engine fault simulator. The simulator replicates the occurrence of faults in a car engine by changing sensor values that strongly affect the operating status of the engine. As shown in Figure 6a, no-fault and fault system states are artificially generated by changing the following sensor values: manifold air pressure (MAP), throttle position sensor (TPS), intake air temperature (IAT), water temperature sensor (WTS), and four injectors. Figure 6b shows the data collection system, which consists of 40 sensors, and the data acquisition module (NI-DAQ). Sensor data were collected at a sampling rate of 10 Hz with accuracy of 0.04% gain error in room temperature, timing accuracy of 50 ppm of sample rate, and timing resolution of 12.5 ns. A single analog-to-digital converter of NI-DAQ samples multiple sensor signals through scanning, buffering, and conditioning processes to minimize measurement noise such as missing signal and abnormal peaks.
The artificial fault generation scenario is to replicate the phenomenon of engine knocking, which is caused by a shortage of fuel in the mix. As is well known, the MAP sensor is installed in the automobile electronic control system, which determines the fuel supply to the engine. Engine knocking is generated when the value of MAP decreases, potentially causing the engine to make a loud knocking or "pinging" noise. The artificial fault generation scenario is to replicate the phenomenon of engine knocking, which is caused by a shortage of fuel in the mix. As is well known, the MAP sensor is installed in the automobile electronic control system, which determines the fuel supply to the engine. Engine knocking is generated when the value of MAP decreases, potentially causing the engine to make a loud knocking or "pinging" noise.
In each trial of the experiment, data collection begins immediately after the engine is turned on. After 240 s of engine operation, the MAP value is decreased for 30 s via the control dial. The MAP value is then increased to the no-fault state for 20 s, and finally the engine is turned off. Time information for each stage is recorded for every trial. In total, 430 experimental trials were performed.

Similarity Analysis
A similarity analysis is performed to demonstrate the effectiveness of the proposed method. The analysis is conducted as follows. First, 6 sensors are selected (out of a total of 40) that have been proven to represent the state of the simulator well. A data set for each sensor is then transformed into a set of DSVs. Repeated holdout cross-validation is applied for estimating the performance of the proposed method, since training data are not given. This study applied holdout rather than k-fold cross-validation since the amount of data is large [30]. Twenty percent of the DSVs are randomly selected as training data, and SPPs are calculated for the data as shown in Figure 7. The remaining DSVs are then classified using the calculated SPP values. The state prediction is performed through 20 repeats of the cross-validation, with the training and test DSVs randomly selected for each trial.
The SPP limit value is set as 0.85 (i.e., the upper and lower limits are set as 0.85 and −0.85). If the SPP for the given DSV is larger than 0.85, the DSV is determined as a reliable state predictor for the no-fault state, and as a reliable state predictor for the fault state when the SPP is lower than −0.85. The responses of the given DSVs can be depicted as shown in Figure 7. "True negative" and "true positive" are the responses when the states of the given DSVs are correctly predicted as no-fault and fault, respectively. A prediction error that judges a no-fault DSV as being in a fault state is called a false alarm (Type I error). A prediction error that assesses a fault-state DSV as being in a no-fault state is called a miss (Type II error). The trial results are summarized in the next section. In each trial of the experiment, data collection begins immediately after the engine is turned on. After 240 s of engine operation, the MAP value is decreased for 30 s via the control dial. The MAP value is then increased to the no-fault state for 20 s, and finally the engine is turned off. Time information for each stage is recorded for every trial. In total, 430 experimental trials were performed.

Similarity Analysis
A similarity analysis is performed to demonstrate the effectiveness of the proposed method. The analysis is conducted as follows. First, 6 sensors are selected (out of a total of 40) that have been proven to represent the state of the simulator well. A data set for each sensor is then transformed into a set of DSVs. Repeated holdout cross-validation is applied for estimating the performance of the proposed method, since training data are not given. This study applied holdout rather than k-fold cross-validation since the amount of data is large [30]. Twenty percent of the DSVs are randomly selected as training data, and SPPs are calculated for the data as shown in Figure 7. The remaining DSVs are then classified using the calculated SPP values. The state prediction is performed through 20 repeats of the cross-validation, with the training and test DSVs randomly selected for each trial.
The SPP limit value is set as 0.85 (i.e., the upper and lower limits are set as 0.85 and −0.85). If the SPP for the given DSV is larger than 0.85, the DSV is determined as a reliable state predictor for the no-fault state, and as a reliable state predictor for the fault state when the SPP is lower than −0.85. The responses of the given DSVs can be depicted as shown in Figure 7. "True negative" and "true positive" are the responses when the states of the given DSVs are correctly predicted as no-fault and fault, respectively. A prediction error that judges a no-fault DSV as being in a fault state is called a false alarm (Type I error). A prediction error that assesses a fault-state DSV as being in a no-fault state is called a miss (Type II error). The trial results are summarized in the next section.

Fault Detection with the SPP
The result of the experimental trials is shown in Table 1. To identify the performance of SPP in fault detection problem, sensitivity and specificity were defined as performance indicators. Sensitivity is defined as the ratio of true positives to the sum of true positives and misses. Specificity, conversely, is defined as the ratio of true negatives to the sum of true negatives and false alarms. In other words, sensitivity and specificity measure the success rate of state prediction for fault and nofault states, respectively. Therefore, higher value of two indicators indicates higher performance in fault detection. As shown in Table 1, both mean values of sensitivity and specificity were the highest, equal to 1 when the limit value is set as 0.85. 'Min' in Table 1 indicates the lowest performance within 20 experimental trials, and they mean that a 2.4% miss rate and a 1.5% false alarm rate occurred in a certain validation dataset. A visual representation is provided in the form of a gradual similarity map, which rearranges DSVs according to SPP value for describing the generated state prediction result in more detail. For an intuitive understanding of the concept, Figure 8 shows the twodimensional state prediction power map generated using DSVs from two sensor signals. SPP values

Fault Detection with the SPP
The result of the experimental trials is shown in Table 1. To identify the performance of SPP in fault detection problem, sensitivity and specificity were defined as performance indicators. Sensitivity is defined as the ratio of true positives to the sum of true positives and misses. Specificity, conversely, is defined as the ratio of true negatives to the sum of true negatives and false alarms. In other words, sensitivity and specificity measure the success rate of state prediction for fault and no-fault states, respectively. Therefore, higher value of two indicators indicates higher performance in fault detection. As shown in Table 1, both mean values of sensitivity and specificity were the highest, equal to 1 when the limit value is set as 0.85. 'Min' in Table 1 indicates the lowest performance within 20 experimental trials, and they mean that a 2.4% miss rate and a 1.5% false alarm rate occurred in a certain validation dataset. A visual representation is provided in the form of a gradual similarity map, which rearranges DSVs according to SPP value for describing the generated state prediction result in more detail. For an intuitive understanding of the concept, Figure 8 shows the two-dimensional state prediction power map generated using DSVs from two sensor signals. SPP values of are placed on a single scale from −1 to +1, represented as a color map. Figure

Discussion
In order to analyze the effects of limit values in calculating SPP value, the trends of sensitivity and specificity in fault detection were investigated with respect to the limit value varying from 70% to 99%, and the corresponding fault detection performance was computed as illustrated in Figure 9. Both performance indicators showed a very high level of correct state predictions, specifically near 0.99, when the limit value was set to be above 0.93. Sensitivity varied between 0.88 and 1.0 when the limit was set between 0.75 and 0.93 and specificity showed less variation than sensitivity for the same range of limit values. However, when the limit value was set to be 0.74, both sensitivity and specificity showed dramatic drops, to 0.58 and 0.68, respectively. Both indicators showed values consistently lower than 0.75 when the limit value was lower than 0.74. It is important to note that two significant principal components (PCs) were extracted through PCA with 40 original sensor signals, and the two PCs were considered as input signals for fault pattern extraction. In general, the generated DSVs from the chosen two PCs are more informative to Limit value (%) Performance indicator

Discussion
In order to analyze the effects of limit values in calculating SPP value, the trends of sensitivity and specificity in fault detection were investigated with respect to the limit value varying from 70% to 99%, and the corresponding fault detection performance was computed as illustrated in Figure 9. Both performance indicators showed a very high level of correct state predictions, specifically near 0.99, when the limit value was set to be above 0.93. Sensitivity varied between 0.88 and 1.0 when the limit was set between 0.75 and 0.93 and specificity showed less variation than sensitivity for the same range of limit values. However, when the limit value was set to be 0.74, both sensitivity and specificity showed dramatic drops, to 0.58 and 0.68, respectively. Both indicators showed values consistently lower than 0.75 when the limit value was lower than 0.74.

Discussion
In order to analyze the effects of limit values in calculating SPP value, the trends of sensitivity and specificity in fault detection were investigated with respect to the limit value varying from 70% to 99%, and the corresponding fault detection performance was computed as illustrated in Figure 9. Both performance indicators showed a very high level of correct state predictions, specifically near 0.99, when the limit value was set to be above 0.93. Sensitivity varied between 0.88 and 1.0 when the limit was set between 0.75 and 0.93 and specificity showed less variation than sensitivity for the same range of limit values. However, when the limit value was set to be 0.74, both sensitivity and specificity showed dramatic drops, to 0.58 and 0.68, respectively. Both indicators showed values consistently lower than 0.75 when the limit value was lower than 0.74. It is important to note that two significant principal components (PCs) were extracted through PCA with 40 original sensor signals, and the two PCs were considered as input signals for fault pattern extraction. In general, the generated DSVs from the chosen two PCs are more informative to It is important to note that two significant principal components (PCs) were extracted through PCA with 40 original sensor signals, and the two PCs were considered as input signals for fault pattern extraction. In general, the generated DSVs from the chosen two PCs are more informative to classify fault and no-fault states compared to the obtained DSVs using raw sensor signals directly. Therefore, from the critical threshold in limit value (i.e., about 75% in Figure 9), ambiguous DSVs which had frequently occurred in both no-fault and fault state began to be observed in fault and no-fault patterns and they made fault detection performance significantly lower. If the original 40 sensor signals are considered, some of which may contain irrelevant features to the systems status change, the gradual drops in sensitivity and specificity can be expected since ambiguous DSVs with different occurrence ratios in fault and no-fault states can be found more uniformly.
In short, this research presented a state prediction method for complex systems that cannot provide large amounts of training time-series data. In particular, the method is useful for the dataset where system faults have rarely occurred or the DSVs of fault patterns were significantly redundant with no-fault ones. Conventional state prediction methods, which use distance-based pattern classifiers, make ambiguous predictions when distances between newly given unknown and trained patterns for system states are the same. Therefore, the state prediction method proposed to interpolate untrained pattern information by using the trained patterns probabilistically. The SPP values represent the reliability of the state prediction, and are obtained by finding the relationship between the conditional probabilities of DSVs given either state and the actual state of the system. As a result, as illustrated in Figure 8a, it is possible to estimate state prediction power of an unfound DSV (for example, l 13 l i4 ) with regard to conditional probability and an amount of its reliability. It is observed that the estimated prediction power of not-found DSVs was also beneficial to the fault detection problem. In addition, due to rearranging of the labels according to the sum of the SPP value of DSVs on each sensor axis (illustrated in Figure 8b), it is also possible to use not only SPP value but also distance information for identifying the prediction power of signal patterns.