Satellite Fault Diagnosis Using Support Vector Machines Based on a Hybrid Voting Mechanism

The satellite fault diagnosis has an important role in enhancing the safety, reliability, and availability of the satellite system. However, the problem of enormous parameters and multiple faults makes a challenge to the satellite fault diagnosis. The interactions between parameters and misclassifications from multiple faults will increase the false alarm rate and the false negative rate. On the other hand, for each satellite fault, there is not enough fault data for training. To most of the classification algorithms, it will degrade the performance of model. In this paper, we proposed an improving SVM based on a hybrid voting mechanism (HVM-SVM) to deal with the problem of enormous parameters, multiple faults, and small samples. Many experimental results show that the accuracy of fault diagnosis using HVM-SVM is improved.


Introduction
With the rapid development of the aerospace engineering, the system structure of satellite has become more and more complex, but the requirements of reliability and safety are also getting higher. However, due to the complexity of the space environment and the testing limitations of satellite, abnormal operation to the satellite or system failure problem often appears. The satellite fault diagnosis has an important role in improving the reliability, safety, and availability of the satellites and it has become the focus in the aerospace research.
Many methods for satellite fault diagnosis have been extensively studied, and these methods can be mainly divided into two categories. One approach is to use the model-based diagnosis for aerospace systems [1]. Another approach is the data-driven approach, also known as the data mining approach or the machine learning approach, which uses historical data to automatically learn a model of system behavior [2]. In model-based approaches, the Kalman filter, also known as linear quadratic estimation (LQE), is quite popular [3]. Although the model-based techniques have good performance in real-time fault diagnosis, their reliability will be decreased when the system nonlinearities, complexity, and modeling uncertainties increase.
Data-driven approaches mostly rely on real-time or historical data collected from the sensors and measurements, so they do not need a detailed mathematical model of the satellite. Many people made lots of contributions in this area. Park et al. applied the BEAM (Beacon-based exception analysis for multimissions) system to fault diagnosis in space shuttle main engine data [4]. They used the dynamical invariant anomaly detector (DIAD) to look for anomalies with a series of measurements observed over time. Schwabacher used two unsupervised anomaly detection algorithms, Orca and GritBot, to diagnose faults with data from two rocket propulsion systems [5]. Iverson's inductive monitoring system (IMS) [6] is another unsupervised learning system for fault diagnosis. It used an algorithm to cluster the nominal data into classes which represented different modes of the system. Ogaji [7] has extended multiple neural networks to isolate component and sensor faults using a cascaded network. Widodo and Yang [8] discussed the effectiveness of support vector machine (SVM) in the machine condition monitoring, and the experiment shows that the SVM is of high accuracy in the faults classification [9]. 2 The Scientific World Journal Most machine learning methods, including pattern recognition and neural networks, need sufficient and high quality sample data. To fault diagnosis, that is to say, the data need to cover all the failure modes, and the similar modes can not be of contradiction. But with the level of manufacture growing, the failure rate of satellite is reduced, so these algorithms are not good at the satellite fault diagnosis.
As mentioned above, satellite fault diagnosis is limited by the conditions of space environment, and large number of fault samples for training is not always obtainable in practice. Therefore, processing the small samples and being of good generalization are great significance for satellite fault diagnosis. SVM [10] developed by Vapnik is based on the minimum structural risk theory and has been widely applied since the 90s in various fault diagnosis and classification problems. It is very useful for the small samples and is characterized by good generalization ability. SVM provides viable tools to deal with nonlinear problems, and, to complex and nonlinear dynamical systems, it is of great flexibility and capability. Many improving algorithms for SVM also have been proposed. Compared with standard SVM method using inequality constraints, Suykens proposed LS-SVM [11]. Through the method, the second norm of the error becomes the optimization goal of the loss function; thus, the solution of the quadratic programming problem has been transformed into linear equations. Monroy et al. proposed a semisupervised approach [12] consisting of different methods such as Gaussian mixture models (GMM), independent component analysis (ICA), Bayesian information criterion (BIC), and SVM, and it is effectively applied to the entire set of the Tennessee Eastman process (TEP) faults. Combining supervised or unsupervised learning methods with SVM is a research hotspot of improving the performance of fault diagnosis, and many algorithms for optimizing traditional SVM have also shown improved performance. From this idea, we proposed a novel method for satellite fault diagnosis, called SVM based on hybrid voting mechanism (HVM-SVM). Considering the characteristics of small sample data, multiple faults, and enormous parameters, the main contribution of this paper is to improve the performance of SVM using a hybrid voting mechanism. The experimental results show that the classification accuracy of HVM-SVM to multiple faults has been enhanced.
The rest of this paper is organized as follows. Section 2 outlines the problem of satellite fault diagnosis. Section 3 contains the solution method. Section 4 introduces the hybrid voting mechanism. The experimental results are discussed in Section 5. We conclude the paper in Section 6.

The Problem of Satellite Fault Diagnosis
For satellite fault diagnosis, there will be two problems that needed to be considered: multiple parameters and multiple faults. The satellite is a so complex system that it needs many parameters of components to record the running states. Take the remote sensing data of satellite for example; the number of parameters is more than two thousand. It is obvious that every fault can not be connected with all the thousands of parameters. So how to confirm the mapping relation between the faults and parameters is the first problem. Usually the fault can be divided into two categories: single fault and multiple faults. In the single-fault mode, it is assumed that there exists only one fault at any time in the system. Frequent testing and maintenance are needed to make sure of the above condition and it will lead to interpretation of uncertainties. However, single-fault assumption is not unreasonable in many real applications such as tolerant system and space-based system, where frequent testing and maintenance are not possible. Since the single fault assumption can lead to incorrect or failed diagnoses when multiple faults occur, the multiple faults diagnosis is more important. However, for the number of candidates faults growing exponentially, multiple faults diagnosis will be a challenging problem. In addition, multiple faults in dynamic systems like satellite may be hard to detect, because interactions among fault effects can obscure the fault signatures. The problem of multiple faults diagnosis can be described as follows. We denote the faults set by   That is, if the system is diagnosable, there is a unique fault which consisted with the deviations of some measurements. There are two reasons in the fact that multiple faults diagnosis is more complex than single-fault diagnosis. First, the effects of a fault would be masked or compensated by another fault. For example, the fault may occur, causing deviations of 0− on , 0+ on , and 0− on . However, if occurred concurrently, causing reverse deviations of 0+ on , 0− on , and 0+ on , then the two faults may not be distinguished. Second, the same multiple faults can be manifested in different ways. For example, fault set { , } will cause 0− or ± on , depending on which fault occurs first and on the fault delays in system. From Figure 1, the occurs first, the effect of 0− or ± will happen depending on how soon occurs after . If occurs close enough to , the 0− effect caused by may not be detected. Figure 1: (a) occurs close enough to ; (b) occurs after (0− and 0+ represent that the value of parameter decreases and increases; ± represents that the value is first increased and then decreased).

Solution Methodology
In Section 2, it is pointed out that some faults are very difficult to find due to the interactions between these parameters. So it needs a comprehensive judgment method which may provide more complementary information about the faults. In this section, we will integrate the diagnosis results from faults associativity, SVM, and combining classifier to improve the accuracy of satellite fault diagnosis.

Fault Associativity.
As described in Definition 1, the fault associativity is about the relation between fault and its corresponding parameters set. For any fault, using all the parameters of modeling, the accuracy of the fault diagnosis will be decreased. So how to confirm the mapping relations between faults and parameters is a key step for satellite faults diagnosis. Sometimes, the confirmation can be accomplished by domain experts. However, it is unreasonable to consider so many parameters, especially for the unknown faults. Based on rough sets [13], a method is used to improve the choosing of parameters.
An information system can be represented as an ordered quaternion, = ⟨ , , , ⟩, which consists of the following: = { 1 , 2 , . . .} is a nonempty, finite set called the universe; = ∪ is a nonempty, finite set of all attributes, in which is the condition attributes set and is the decision attributes set, ∩ = ; = ⋃ ∈ is a set of some attributes, where is called the domain of ; for ∈ , ∈ , ( , ) is the value of in attribute . In the following fault diagnosis table, the columns are labeled by attributes and rows by objects (the classes of faults), and = {1, 2, 3, 4, 5, 6} and = { , , , , }.
For every subset of attributes ⊆ , define an indiscernibility binary relation IND( ): where IND( ) is an equivalence relation and Objects , satisfying the relation IND( ) are indiscernible by attributes from . Consider the subset = { , , } in Table 1 As mentioned above, in order to find the fault associativity, the parameter reduction is necessary. Supposing a parameters subset ⊆ , if ⊆ is independent and IND( ) = IND( ), then is called the parameter reduction of , that is red( ) = { }. The parameter reduction algorithm based on discernibility matrix is as in Algorithm 1.
It can be seen from the reduction algorithm that the results can have multiple reductions and here the red( ) is the candidate set of fault associativities. Which red( ) will be chosen for a fault depended on the highest accuracy of the fault diagnosis model trained by . Supposing the faults set = { 1 , 2 , 3 }, the red( ) = { 1 , 2 , 3 }, model 1 is trained using the parameter set 1 , model 2 is trained using the parameter set 2 and model 3 is trained using the parameter set 3 . The accuracy of model for the fault is defined as ( , , ). So the corresponding parameters set of fault can be defined as For example, if ( 1 , 1 , 1 ) is higher than ( 2 , 2 , 1 ) and ( 3 , 3 , 1 ), the red( ) = 1 will be selected as the parameters set related to fault 1 .

The Fault Diagnosis
Optimal separating hyperplane minimizations concept, and it has been widely applied in fault diagnosis. A fundamental knowledge about the classical SVM will be presented firstly. SVM is a binary classifier which can be used to classify data into two classes: positive and negative. Supposing a set of points with two classes, SVM establishes a hyperplane that separates the majority of positive points from the negative points and maximizes the distance between the two classes to this hyperplane. The maximum distance hyperplane is also called the optimal separating hyperplane. The nearest points of two classes to hyperplane are employed to define the support vectors. Figure 2 shows an example of optimal separating hyperplane of two classes.
Supposing a known training set { , } ( = 1, . . . , ), ∈ R , ∈ {−1, 1}, is the input vector, and is the required classification. The SVM is to estimate a function that can separate the given data { , }. The optimal hyperplane is defined as where ∈ is a vector of weights and is a scalar bias term. The and are used to describe the position of the hyperplane. A vector with the same class of must satisfy the equation To satellite fault diagnosis, most patterns are not linearly separated. In order to decrease the computational efforts of the support machines, the SVM constructs an optimal separating hyperplane in this higher dimensional space called feature space by choosing a nonlinear mapping a priori. A positive slack variable for every training sample is defined to obtain a hyperplane with larger distance. This also permits some samples to be misclassified. So searching the optimal hyperplane can be obtained as a solution to the following constrained quadratic optimization problem: where is the regularization parameter that determines the balance between the maximization hyperplane and minimization classification error. If 0 ≤ ≤ 1, it means that is on the right side of the hyperplane, and the pattern is classified correctly. If > 1, it means that is on the wrong side of the hyperplane.
The basic form of SVM is a binary classifier which separates a set of positive examples from a set of negative examples, also called dichotomies. For more than two classes, unfortunately, there is no unique method for SVM to deal with multiple faults. The general approaches adapting SVM to multiple classes are to reduce the problem of multiclass to a set of binary problems. One method is to construct binary classifier where the is the number of classes. It is called the one-against-all for every binary classifier separates one class from all the other classes. Using the method to classify a new sample, each binary classifier generates a class and the result with the highest confidence is chosen finally. Another strategy constructs ( −1)/2 binary classifiers; each of them separates only two classes. For example, to the faults set = each class, and the class with the maximum votes is selected. This method is certainly more efficient than one-against-all, but is has a major drawback. That is, each classifier model is trained by the data only from two classes and not considering the fault associativity, but, in the fault diagnosis phase, the outputs using data may be from any class [14]. To solve the problem, the combining classifiers strategy is used to obtain the synthesis decision.

Combining Classifiers.
As different classifiers may offer complementary information about the fault to be classified, combining classifiers, in an efficient way, can achieve better classification results than any single classifier. To multiple faults diagnosis, the ultimate goal of combining classifiers is to achieve the best possible classification performance of the faults. The combining classifier is to combine the outputs of multiple classifiers into one classification result according to some rules. An example of combining classifier is shown in Figure 3. There are many combination rules for combining classifier, such as max rule, min rule, median rule, and majority vote rule.
Supposing an aggregation is composed of classifiers, to sample ∈ , the output of all classifiers in is ( ) = ( ( ) 1 , . . . , ( ) , . . . , ( ) ) . For example, set a threshold as and the majority vote rule can be described as It is said that for each class the sum on the right hand side of (9) simply counts the votes received for this hypothesis from the individual classifiers. The class which receives the largest number of votes is then selected as the consensus (majority) decision.
Take the XOR problem for example; the solution method using combining classifiers is as follows. The decision equations of three classifiers are 1 : = (− 1 + 0.5) ∩ ( 2 + 0.5) , The decision region is as shown in Figure 5. It shows that any classifier's accuracy is only 75% to this problem, but the accuracy will be 100% using combining classifiers with the majority vote rule. It is said that the classification accuracy of single classifier is low sometimes, but the accuracy will be improved greatly using combining classifiers.
Considering the fault associativity, there will be fault models ( = 1, 2, . . . , ) ( is also the number of faults) trained by the data from related parameters set for fault using the second method mentioned in Section 3.2. Combining the fault classifiers will not only improve the accuracy of model, but also cover all the fault classes.

Hybrid Voting Mechanism
Satellite fault diagnosis is a classical multiple faults problem. Its complexity depends on the fact that not only the types of faults are numerous, but also the number of parameters is large. Using SVM to diagnose satellite faults, there are two problems that need to be solved. The first problem is, to so many parameters, how to find the mapping relations between faults and parameters and reduce the interactions among them. Second, to multiple classifiers, the way of combining the results from them also needs to be considered. Considering the two problems, a multiple-model SVM based on a hybrid voting mechanism (HVM-SVM) is proposed, in which not only the combining classifiers are used to vote, but also the fault associativity is added to improve the voting. Obviously, in satellite fault diagnosis, when a fault occurs, only some of the parameters related to it are changed. That is to say, if some related parameters are abnormal, the fault may have happened.
Supposing a parameter set = { , . . . , , . . . , } related to a fault , where ∈ , includes all parameters, and is the threshold of . Defining a fault signature for (the expected output and actual output of are and , resp.), Considering the interactions in the parameters, the essential condition of fault which occurred is defined as HVM-SVM is a new combined strategy based on SVM. It generated multiple fault models using SVM to learn all fault data with the data of related parameters set. x p2 x pi  combining the results from these models and the essential condition of faults to the new data, HVM-SVM will obtain higher fault recognition performance. In the combination, due to different fault models with the different contribution to the fault diagnosis, how to set the weight of each single SVM classifiers is very important. A popular solution is based on the classification error rate to assign the weight of each classifier [15]. Assuming the classification error rate of the The Scientific World Journal 7 Input: sample set for training (including normal data and fault data): ( 1 , 2 , . . . , , ), is the eigenvalue, and is the classifying label. Output: fault diagnosis model ( ) (1) standardized the sample set ( 1 , 2 , . . . , , ); (2) parameter selection: choosing the kernel function ( , ) and the kernel's parameters; (3) computer the Lagrangian coefficient ; (4) obtained the support vector sv(); (5) computer the threshold value ; (6) establish the optimal separating hyperplane ( ).
model is , the weight of the model can be defined as follows: The HVM-SVM algorithm is divided into two stages: fault feature extraction and model training. The feature extraction is to establish the parameters set related to the faults, and it can be obtained by the parameter reduction algorithm mentioned in Section 3.1. Considering the fault and the red( ) = , model is trained and only used the data of so that it can reduce the bad influence from the unrelated parameters in the model training. The algorithms for model training are as in Algorithm 2.
For multiple faults, the multiple fault models ( ) will generate for each fault using the relation data of red( ) = . Defining a decision function ( , ) of the th data record using ( ), in which it returns the classification result of the data record, The function ( , ) is only the decision from the model ( ), and the essential condition of fault from the relation parameters set is also considered in HVM-SVM, which is also shown in Figure 6.
So a vector to describe the essential condition of fault is defined as follows: And the hybrid decision function using the majority vote rule for multiple faults diagnosis is as follows: The algorithm for multiple faults model is described as in Algorithm 3.

Experimental Evaluation
Many experiments have been tested to evaluate the accuracy of HVM-SVM from single fault to multiple faults. For multiple faults diagnosis, the SVM, -nearest neighbor, and neural network are selected to compare with HVM-SVM. The satellite remote sensing data is chosen as the test data, including the normal data and the fault data. x p1 x p2 x pi condition C(f i ) is the number of true faults. The experimental results are as shown in Figure 7. In Figure 7, it can be seen that the FAR = 8/(8 + 478) = 0.016, but the FNR is 199/(199+20) = 0.91. It is said that more than 90% of normal data are diagnosed as fault . The reason is that the 's diagnosis model is training using the whole parameters, not considering the parameter reduction, which caused the inaccuracy and instability of fault model. In order to reduce the interactions from other irrelevant parameters, we only used the parameters in to train the diagnosis model of fault . The results of this model are shown in Figure 8.

Multiple Faults Model.
It is obvious that the satellite fault diagnosis is not a single-fault problem. Suppose there are three modes in satellite system: fault , fault , and normal . The related parameters are also considered, and the training samples are composed of and , respectively. Using the second method of SVM introduced in Section 3.2, a simple multiple faults model can be obtained. Using the model to diagnose the fault, the results are as shown in Figure 9.
It is can be seen from Figure 9 that using single model to classify faults is not a good way, and the FAR of each class will  Normal  3288  8  3  3  3 1 0 6  3 0  0  3  1 5 5 9 be high. In the above example, the FARs are 24.6%, 30%, and 64.6%, respectively. For multiple faults, the single model will get the bad performance. Next, the HVM-SVM is used to diagnose the multiple faults, and the same data is used but it added more records. There are three models , , and trained from related data, that is, obtained from , obtained from , and obtained from the normal data. Using the three models to diagnose the satellite data, respectively, the results are shown in Figure 10.
It can be seen from Figure 10 that each model has its own classification results for three modes, which is the best. We used the hybrid voting mechanism of HVM-SVM to integrate the results. The results can be seen in Table 2.
It is said that, after four votes (three by classifiers and one by essential condition of faults), there are three records misclassified as fault in fault , 11 misclassification records in fault , and 33 error records in normal . The fault recognition rate of , , and is 99.9%, 99.6%, and 97.9%, respectively.

The Accuracy of HVM-SVM.
HVM-SVM used combining classifiers strategy to improve the accuracy of fault diagnosis. In fact, there are many classification methods which can be used as the classifiers, such as neural networks, decision trees, -nearest neighbor, and naive Bayes. Why is only SVM used for HVM-SVM? As mentioned in Section 1, SVM is very suitable for satellite faults diagnosis duo to its small fault samples. The following experiment can demonstrate the fact. The SVM, neural networks, and decision tree are selected to test the FAR on small fault samples. It can be seen from Figure 11 that when the number of training samples becomes smaller, the FAR of SVM is significantly lower than the other two methods.
The performance of HVM-SVM is measured by the accuracy rate of multiple faults diagnosis compared with SVM, neural networks, and -nearest neighbor. The experimental data and fault types are the same with Section 5.
The experimental results are shown in Table 3. From Table 3, HVM-SVM gets the best performance in the multiple faults diagnosis, especially when the percentage Normal N Figure 10: Using three models to diagnose the satellite remote sensing data, respectively. of training samples plunged to 1%; it still has almost 70% accuracy rate for separating fault . Most methods will be more precise with the training samples increased, but the neural networks have a little instability in accuracy. Although SVM will get higher accuracy for small samples, to multiple faults diagnosis, it may misclassify faults with high probability. The comparison of the three methods in FRR is also illustrated in Figure 12.

Conclusion
The satellite fault diagnosis is different from general fault diagnosis for its special features, such as enormous parameters, multiple faults, and small samples. The limitations and complexity of space environment make the problem more serious. In this paper, we introduced an improving SVM algorithm based on a hybrid voting mechanism to enhance the accuracy of satellite fault diagnosis. To reduce the interactions of multiple parameters, we proposed a parameter reduction algorithm to find the mapping relation between faults and the parameters. SVM is suitable for classifying small samples, but not multiple faults. We combine multiple SVM classifiers and use the majority vote rule to deal with multiple faults. The contribution of our method is that not only the combining classifiers are used to vote, but also the fault associativity (fault essential condition) is added to improve the voting. Many experimental results illustrated that the HVM-SVM method is very suitable for satellite fault The Scientific World Journal  diagnosis and, compared to some classification methods, it has the best performance.