A review of fault diagnosis and fault-tolerant control of vehicular polymer electrolyte membrane fuel cell power system

Proton exchange membrane fuel cell (PEMFC) is a promising vehicle power because of its high energy conversion efficiency and pollution-free reactant. However, the complex structure of multi-system coordination makes the correlation between parameters strong, the fault rate is high and the control is difficult. Timely and accurate fault diagnosis and effective fault-tolerant control are of great significance to the stable output and durability of PEMFC system. Firstly, this paper classifies different faults and describes the applicable scenarios of each classification method. Then, the fault diagnosis methods based on experiment, model and data are summarized, and their advantages and disadvantages are compared. Then, the methods and characteristics of hardware fault tolerance and software fault tolerance are analyzed from the point of redundancy, and the application of fault tolerance control in maintaining the stability of PEMFC system is summarized. Finally, the improvement direction and development prospect of fault diagnosis related technology of vehicle PEMFC system are proposed.


INTRODUCTION
Because of high energy efficiency and zero pollution emission, fuel cell has a bright future in the power field such as automobiles. For the power demand of the automobile, the proton exchange membrane fuel cell (PEMFC) has the prominent advantages of fast fuel replenishment speed, fast start-up speed

Classification of Faults
There are many types of fault in fuel cell system, and the diagnosis methods and mitigation measures are different for each type of fault, so it is necessary to judge the type before fault diagnosis. In fault diagnosis, the influence of the fault on the whole vehicle operation, the abnormity of the system components and the abnormity of the system output parameters are mostly considered. In this paper, three common classification methods are summarized as fault level, fault interval and fault representation.

Fault level
The classification method of the fault level refers to the classification of the level according to the impact of the fault on the operation of the vehicle from the perspective of safety and stability, and corresponds to different control strategies and intervention measures. At present, it is generally divided into four levels and its corresponding measures and characteristics are shown in Table Ⅰ [11]. The main assembly are damaged, and cannot be eliminated within a short time (30min) with vehicle tools and fragile spare parts, needed to be repaired soon

Level 3 Power-down fault
It can be eliminated within a short time (about 30min) with onboard tools and fragile spare parts or by restarting, which is a hidden danger that may cause greater fault.

Level 4 Warning fault
No need to replace parts, just use onboard tools or restarting after a short time (about 5 minutes) to exclude.
The 1-4 levels are dangerous, serious, general and slight, which is a common method for fuel cell fault classification [12]. The actions are, emergency shutdown in first-level fault; second-level needs personnel intervention to judge whether to shut down; third-level fault, power output limitation; fourthlevel fault, warning but not interfering [13].

Fault interval
Fault interval refers to the faults positing. Common faults of each component can be known in advance to avoid. As shown in Figure 1, the possible locations of fault are stack, hydrogen supply system, air supply system, thermal management system, sensors, and so on.

Stack fault
The frequent occurrence of flooding and dry faults affects the transmission performance of the PEM and even the stack life. Both are recoverable faults that can be mitigated by control.

Hydrogen/air supply system fault
The common faults in hydrogen/air system are the improper pressure and flow rate in the gas pipeline, which affect the output power of stack and even result in hydrogen leakage, oxygen starvation and pipeline broken [14]. Hydrogen pressure valve and air compressor play a key role in adjustment [15].

Thermal management system fault
The imbalance of temperature and humidity regulation is the most obvious influence of thermal management system fault on PEMFC. Common faults include low pressure of the cooling water and high temperature of the cooling water out of the stack, as well as cooling pipe leakage and blockage and others. Sections monitoring is a very common and effective method for pipeline faults [16].

Sensor fault
The complex environment of PEMFC system will affect the stability of the sensor and lead to sensor fault. To ensure the reliability of the sensor, in [17] the average value, standard deviation, and slope of the sensor signal were extracted and analyzed to detect sensors that have abnormal humidity measurement due to the influence of liquid water accumulation.
The above faults of the subsystem can be alleviated and avoided by changing the operating conditions and optimizing the sub-parts. The fault-tolerant control has obvious effect on the sensor fault. In addition, subsection detection is a common method in fault diagnosis of subsystems, which is convenient for fault location and improves the efficiency of diagnosis.

Fault characterization
The faults of fuel cell system are caused by many factors and have coupling. Experiments have proved that the internal resistance and output voltage of the fuel cell are sensitive to faults. Therefore, in the research focusing on output performance, the abnormality of the internal resistance and output voltage is collectively referred to as fault characterization in this paper.

Abnormal internal resistance
The properties of the PEM, catalyst layer and gas diffusion layer are affected by temperature and humidity, which mainly include ohmic impedance, polarization internal resistance and concentration difference internal resistance.
Due to the coupling of the cause, the change of internal resistance does not correspond exclusively to fault. In [18], through experimental tests and analysis of EIS maps, it was shown that when the humidity is too low, ohmic resistance increases sharply, and even dry fault occurs. In [19], it is proved that the polarization internal resistance is positively correlated with the dryness degree.

Abnormal voltage
The faults of PEMFC system affect the internal flow field, material transfer and so on, which led to the output voltage fluctuation. Therefore, the output voltage of the stack can be obtained through a simulation model or an experimental bench to analyze the voltage change and the pressure drop between the two poles, and obtain the correlation between the voltage change and the fault.
It is shown that the voltage drop of flooded single-chip battery is positively correlated with the amount of accumulated water in [20] [21]. In [22], it was illustrated that when the membrane is slightly dry, the voltage will fluctuate and drop significantly; in the case of serious dry , the voltage will drop sharply; The 'reverse polarity' phenomenon will appear in severe flooding.
The review of fault characterization studies shows that, the flooding has a significant impact on the internal resistance of the concentration difference, and the dry fault has a more obvious impact on the ohmic impedance. Both of them have effect on the voltage drop, but the voltage drop is closer to the membrane dryness fault, and it will cause the battery reverse pole when the flooding serious. The application scenarios of the three methods were compared in Table Ⅱ. The methods of level classification and fault characterization suit for analyzing the effect on the output characteristics of vehicle power system. The fault interval is more suitable to study the reaction mechanism and causes in the stack. Among them, the fault level provides the basis for the alarm critical value and the control strategy formulation, but no unified standard has been proposed for the fault level. The fault characterization method is limited by the phased evolution of output parameters following the fault, such as the degree of voltage drop is affected by the degree of flooding. Moreover, the fault type and characterization are not the only corresponding, feasibility and reliability can be improved by monitoring multiple outputs simultaneously.

Fault Diagnosis Methods
Fault diagnosis includes fault detection, fault location and fault identification. Fault diagnosis can be divided into reasoning oriented qualitative analysis and quantitative analysis through comparison of data.

Qualitative analysis
Qualitative analysis refers to the observer's experience through subjective logic reasoning, to determine the cause and type of fault, like fault tree analysis and expert systems.

Fault tree
The fault tree analysis (FTA) is a layer-by-layer deductive analysis that shows faults and possible causes from top to bottom by tree diagram. The fault is regarded as the top event, and the possible causes are connected with the Boolean logic symbol (gate symbol) below. Through quantitative calculation, the probability of the fault causes can be analyzed.
In [23] , the fault of fuel cell power system was regarded as the top event, and it was decomposed into faults of each component and then divided into the possible causes, like lower level of DCDC fault expansion fault tree shown in Figure 2. In [24], during studying the reliability of PEMFC, the nonelectric energy output of the module was taken as the top event, and the minimum cut set is solved by the descending method. In [25], calculated fault probability for hydrogen safety was seen the bottom event based on fuzzy mathematics.
The FTA is often used to analyze and evaluate the safety and stability of a fuel cell system. Its core lies in the determination of depth and breadth, which requires knowledge of the mechanism and composition of PEMFC system in order to construct an effective bottom event. Because of the uncertainty of intermediate and base events' probability, fuzzy mathematics performs well in the probability analysis.

Expert system based
The method based on expert system is to make rules by using expert experience combined with intelligent technology and according to the data information of stacks to simulate experts for judging fault types. In [26], an expert system was established based on preset fault rules, the power switching control of fuel cell and battery is realized by using forward inference method. In addition, the expert system combined with neural network can improve its flexibility and accuracy. And the self-learning feature of neural network can help update and expand the expert knowledge base. In [27], the effectiveness of this method was verified in the fault diagnosis of the hydrogen supply system.

Quantitative analysis
Quantitative analysis refers to fault detection and identification by comparing the changes and differences of PEMFC system test data. Experiment-based method, model-based method and databased method are mainly adopted in quantitative analysis.

Experiment -based method
The experiment-based method is to observe fault characteristics and output characteristics under different working conditions by simulating fault operation, and obtain parameter thresholds that are strongly coupled to the fault, which provides a data reference for avoiding faults during battery operation.
In [28], fan overheating, flooding, low voltage and temperature control faults were simulated to analyze the changes of net power and stack power curves. In [29], the change characteristics of anode gas pressure drop in different stages of flooding faults were studied by water flooding experiment in which the current value, temperature, gas pressure, excess coefficient and humidification parameters were set. In [30], author took the current and voltage of the cell as output, and obtained the fault threshold through simulation experiment to identify the flooding and dry fault. The threshold and characterization parameters of fuel cell faults obtained by experience provide data support for modeling and simulation, especially can be seen as the basis of fault classification.

Model-based
The principle of model-based diagnosis method is to judge fault information by analyzing whether the model output residuals exceed the threshold, as shown in Figure 3. The steps mainly include: model establishment, model verification and residual analysis. The common used models can be divided into mechanism and experience-mechanism hybrid model.  (1) Mechanism model The mechanism model describes the internal reaction and output characteristics of fuel cells by expressing microscopic changes such as energy, mass, flow field and current transmission through differential equations.
In [31], models were selected to simulate the internal reaction and fuel flow field characteristics of the fuel cell to build a single cell model as summarized in Table 3. In [32], during the thermal management dynamic modeling, the Nernst-Planck equation was used to express the water flow from anode to cathode, and the empirical formula was used to describe the relationship between water content and water activity in gas phase equilibrium. Moreover, in [33], air supply system model was established, including cathode air flow model, supply return pipe model and fan model, which is helpful for the research of air supply system. The equivalent circuit can avoid the interference in the actual measurement, so it is generally accepted to obtain the electrochemical impedance spectroscopy (EIS) by the equivalent circuit instead of the experimental test. Its principle is to use circuits to simulate the physical behavior of stack, and to equate the power generation process inside with a voltage source, with resistors representing the energy loss from the fuel cell chemical reaction, like Randle's equivalent circuit model as shown in Figure  4 [34]. The EIS of fuel cell is obtained and analyzed by establishing the equivalent circuit model of fuel cell, which is also called AC impedance method [35]. In [36], an optimized of Randle's circuit was proposed, as shown in Figure 5. The m-sequence signal is superimposed on the output current to gain multiple frequencies impedance measurement in one injection, which greatly improves the efficiency. The equivalent circuit model built in usual studies is mainly obtained by adopting the classical Randel's model or by experimental fitting. For vehicular PEMFC with variable operating conditions, its internal reaction and input-output may be affected by outer-environment, thus obtaining equivalent circuit models highly matched with specific stacks and operating conditions deserves further study. The analysis based on EIS can distinguish different faults, which is effective for fault classification. But it is still hard to obtain high-power stacks' EIS quickly The least square minimizes the sum of squares of the errors between estimate and actual data, which can be used to optimize the models, including equivalent circuits. In [36] [37] [38] , the parameters of polarization curve, empirical-mechanism hybrid model and equivalent circuit are all identified online by the least squares, and the results showed that the model is reliable.
(2) Experience-mechanism hybrid model In the experience-mechanism hybrid model (EMM), one is to express the undetermined parameters that are difficult to describe in the mechanism by empirical coefficients according to experimental data, or to modify them, the other is to express reactions that are hard to measure. Due to cells' small size, it is not easy to know the transfer process and local distribution of inner-reactants by test, the mechanism model is suitable for expressing inner-reactions [39]. It makes the model easier and save solving time within tests 'experience, and does not depend on experimental optimization, thus improving the universality of the model.
The use of the EMM approaches are illustrated in Table 4. It can be found that the mechanism model is often used to describe the inner-reaction, such as gas diffusion and component transport. There are some deviations in the residual error calculating of with model-based method. Adding equipment can improve the accuracy but will sacrifice a part stability, needed to trade off them two while using.

Data-based
The essence of data-based methods is data processing and fault classification based on statistical theory. It combines mathematical methods and algorithms to analyze the historical operation data, select several diagnostic variables for feature extraction and fault classification.
The process mainly includes data acquisition, pre-processing, feature extraction and fault classification, as shown in Figure 6 [42]. The methods used in data processing commonly can be divided into multivariate statistical analysis, spectrum analysis and artificial intelligence. is often used in data dimensionality reduction and signal feature extraction, transforming variables within certain correlation into unrelated, so as to reduce the complexity of analysis and retain as much information of original variables as possible [43].
In [44], 12-dimensional original data was reduced to 4-dimensional fault feature vector, and the component-contribution rate accounts for nearly 90%, the data points scattered over the whole scale, and the overlap between the data of three faults states and the normal is very small, that is, the feature extraction is reliable.
(2) Spectral analysis Wavelet transform is often used in signal analysis of fuel cell fault diagnosis. Under different operating conditions, the intensity of the acquired signal varies with the frequency band, and the frequency spectrum can reflect the fails [45]. Therefore, calculating and analyzing the eigenvalues of different frequency bands can fault diagnosis of the system.
In [46], the energy eigenvalues in high and low frequency band were obtained by wavelet decomposition and reconstructing the output voltage signals to diagnose faults. The results showed that the characteristic values of the selected nodes are obviously different from those of normal. In [47], Daubechies 5 wavelet was selected as the wavelet base by voltage signal fault detection for better performance.
(3) Artificial intelligence approaches The basic algorithms commonly used for fault classifiers based on artificial intelligence (AI) include neural network (NN), support vector machine (SVM), fuzzy clustering (FCM), random forest (RF) and other machine learning algorithms, also fusion algorithms based on them [48].
In the study of flooding and dry fault diagnosis, fault features were extracted by the self-learning of CNN algorithm in [49]. Based this, on-line transfinite sequence machine (OS-ELM) was used to improve the learning efficiency and computing speed in [50]. Summarizing the application, in order to improve the reliability and computing speed of the algorithms, most of them combined multiple algorithms for parameter optimization. Multi-classification Relevance vector machine (mRVM) use Bayesian to overcome the SVM's binary-classify limitations, which combines with FCM to pre-screen the samples, improve their validity, and overcome the limitation of Mercer to the kernel function[51]. K-means clustering algorithm can update SVM classifier and adjust it adaptively [52]. Particle swarm optimization (PSO) optimizes parameters of kernel function in SVM [53] [54]. The NN and SVM is more effective in the classification of new faults when the characteristics of known fault types are known [55] [56]. The data-based method does not depend on the physical characteristics and working 10 mechanism of the models, the difficulty of data acquisition is reduced and the calculating precision is improved by the development of computer and sensor technology. However, noise and electromagnetic interference affect the data reliability, and to realize on-line vehicular use needs a great deal of data processing, high requirements for computing power and storage capacity.

Brief summary
Summing up the above description of fault diagnosis methods, we can find that qualitative analysis is suitable for the safety assessment of fuel cell system, by analyzing the cause of fault, inferring the type of fault, and estimating the probability of fault, to intervene in advance. The experimental method in the Quantitative analysis is reliable but time consuming; the model based method can make up for the cost and time consuming, with less uncertain parameters, less computation and more convenient for engineering application, but the single corresponding modeling method has obvious limitation, and the data-based method has strong adaptability and can satisfy the on-line diagnosis and various working conditions.
By summarizing the fault diagnosis methods, it can be found that qualitative analysis is suitable for assessing safety of fuel cell system, analyzing the causes, inferring the fault type and estimating the probability, to take action in advance. Experimental methods in quantitative analysis are reliable but time-consuming. The model-based method can make up the cost, and has fewer uncertain parameters as well as less calculation, which is more convenient for engineering application. However, its single corresponding models have obvious limitations. The data-based method can meet the online diagnosis requirements and multiple running conditions, but it requires more data and stronger computing power.
In order to seek better diagnostic effect, many studies tend to blend multiple approaches, such as AC impedance based on algorithms like fuzzy clustering [57]. Moreover, on the basis of equivalent circuit simulation, fault diagnosis is carried out by data processing technology to get high efficiency and accuracy.

FAULT TOLERANCE CONTROL
When the vehicle PEMFC system fails in operation, it is difficult to be repaired artificially. Therefore, according to the results of fault diagnosis, it is important to make fault compensation timely to maintain the fuel cell engine output steady and vehicle running smoothly.
Fault tolerance control (FTC) means through effective control methods to remedy, repair or prevent the fault, to ensure the basic operation of the system. It includes passive fault tolerance control and active fault tolerance control (AFTC), the former keeps the system stable by using a controller which considers the fault condition, and the control strategy remains unchanged, so it's inflexibility having obvious application limitations [58]. By comparison, the latter can adjust the controller and control law actively, better adaptability. Therefore, this paper mainly discuss active fault-tolerant control. As shown in Figure 7, the AFTC controls the action of the system through the output of the diagnosis module, then the decision-making module defines the best action, and finally each controller executes the selected strategy. Active fault tolerance is divided into hardware redundancy and software redundancy (also known as 'parsing redundancy'), the farmer carries on the fault-tolerant control by multiple subsystems and components backups, and the latter through parameter refactoring.

Hardware redundancy
Hardware redundancy use multiple execution units to accomplish a same task, or by setting up alternate devices, so when one of them fails, the system continues to work with redundant hardware.
In [59], a master-backup microcontroller was designed for the control system of fuel cell bus. When the master controller fails, the backup controller executes a simpler control strategy to keep the system running. Similarly, in [60] ,the author come up with a non-linear controller based on the feedback linear technique (FBL) , which switches to a pre-designed redundant actuator when the pressure valve fails and does not recover.

Software redundancy
Software redundancy complete fault tolerance by signal/parameter reconstruction based on estimation algorithm, divided into signal reconstruction, controller reconstruction and control law reconstruction.

Sensor signal reconstruction
Sensors are the main data source of fuel cell fault diagnosis, signal reconstruction is to ensure the signal can describe the measurement as accurately as possible under the influence of faults, commonly using observer-based estimation methods [61].
Kalman filter is a common method of parameter estimation, using linear state equations and regressive methods to estimate the current value based on the previous estimate and the latest observation. The robustness of system used Kalman filter is poor when one of the sensors fails. The combined Kalman filter method disperses the filter into multiple local filters corresponding to the sensor and the main filter of system which can fuse the local output, thus improving the real time performance and robustness. In [62], sensors fault tolerance was carried out by the reference sensors fusing signal with other sensors based on the combined Kalman filter. The simulation results showed that the estimation effect of Kalman filter is better than that of traditional. In [63], the fusion algorithm of the combined Kalman filter was used to reduce the same parameter characteristics requirements of redundant sensors. Extended Kalman Filtering (EKF) is the extension of standard in nonlinear systems, in [64], the author proposed using state estimative values based on the EKF instead of the measured to solve sensors fault of the engine system.
For highly nonlinear systems, the accuracy of Kalman filter method is still insufficient. There are many other methods to be used, in [65], authors proposed a position estimation algorithm based on phase-locked loop (PLL) , the angular velocity and electric angle of state estimation were used as the control basis of the position sensor. In [66], active fault-tolerant control of fuel cell outlet temperature was realized by sliding mode controller with control accuracy less than 0.5 °C. In [67], they used the predicted stack current value by the neural network replaced the actual measured of the faulty current sensors in the air supply system, and the neural network is trained by the sensor data which is highly correlated with the stack total current sensor.
The filter methods such as Kalman are simple under the premise of ensuring the filter effect, but how to improve the estimation accuracy is a concern in the follow-up research. Neural network-based methods use predicted and actual values to form residual sequences, which are not limited by the model and have a wide range of applications, but the large amount of computation and the convergence speed are all factors that restrict its realization on board.

Control law reconstruction
Active fault-tolerant control of PEMFC system mainly deals with different types of faults through the reconstruction of control law, including control law reschedule and reconfiguration. Control law rescheduling means that gain parameters of control law needed by each fault are calculated in advance and stored, and then according to the result of fault diagnosis, the corresponding gain parameters are selected to obtain the control law in this scenario. Research experience has shown that this method works better on the basis of expert systems, it reduces the calculation amount and shortens the computing time, but it cannot be adjusted according to the actual running status, so the accuracy and follow-up of the vehicle application is poor.
Control law reconstruction is to adjust the control law on-line for fault-tolerant control when the fault occurs, by designing the control law corresponding to the possible faults in advance, or adjusting the control law on-line according to the fault diagnosis results. There are two main approaches:

Adaptive estimation method based on multi-model
The method of multi-model adaptive estimation is to select a model on-line which is the closest to the real system according to the current state in the pre-designed multi-state space model, the core of it is accurate match.
In [68], they designed three kinds of controllers based on feedback linearization algorithm, flooding, membrane drying and normal state. Controllers were switched on-line according to the diagnosed fault types, and the accuracy was improved by updating the weights and thresholds with the rolling optimization controller. In [69], fault and control variables were connected with Petri nets in advance they connect the fault and control variables with Petri nets in advance and possible mitigation actions are set for each fault. The controller is mainly used to switch the fault condition and mitigation actions.

Adaptive model based on neural network
Neural network algorithms are outstanding in real-time parameter adjustment and on-line following, which improves the adaptability of the control model to faults.
In [70], a fault-tolerant strategy based on sliding mode control for thermal management system of PEMFC was proposed, they tracked the temperature by building balance surface and switching rate based on sine function. Moreover in [71], BFA method was used to optimize the voltage controller online, according to the voltage gain, controller parameters are corrected to minimize the voltage deviation.

Brief summary
The hardware redundancy makes the system structure more complex and inconvenient. In contrast, software redundancy method is more flexible, with better methods provided by Artificial Intelligence Technology for Controller reconfiguration on-line, it has also become the preferred approach.

DISCCUSION
At present, some significant research results have been obtained in fault diagnosis of PEMFC, but there are still limitations that have not been broken. With the development of measurement technology and AI, it provides the development chances for the diagnosis of real-time, multi-fault type and high-power reactor. So view of the limitations of current research and the development trend of fault diagnosis in the future, there are many problems worth discussing:

Consider of the vehicle running condition
The conditions of the fuel cell system for vehicles are complex, so the more obvious variability and randomness of the fault need to be considered. For example, as running time accumulates, temperature and humidity change, there are different degrees of flooding, like mild flooding -moderate floodingsevere flooding, therefore, it is very important to realize real-time diagnosis. Combining variety of algorithms can be considered.

Establish the fault classification index
Now, researchers has not put forward a uniform standard for fault grade, which limits the popularity of this method. The quantitative evaluation criteria for different fuel cell systems, such as the inefficiency, can improve the efficiency of the research.

Consider of the overall reactor system
Many scholars prefer to study the variation of parameters such as voltage, resistance and output power of the cell(s), however, the vehicle PEMFC system as a power source, its overall output characteristics are very important to the operation of the vehicle. So, in the continuous updating of data acquisition technology, the fault diagnosis of the entire reactor output needs further study.

Combine the fault tolerance and diagnosis
At present, the research of car-used PEMFC fault mainly focuses on the diagnosis methods, more is to discuss the diagnosis and fault tolerance control of fuel cell system separately, which ignores the influence of fault type and occurrence frequency on fault tolerance control strategy. Therefore, the integrated research of them two will improve the accuracy and effectiveness.

Research the fault-tolerant control under vehicle-borne condition
The fault-tolerant control strategy of vehicle-borne fuel cell system should be simple, especially sensitive to additional components and equipment. Secondly, timeliness and robustness directly affect the running stability of the vehicle, which needs to be improved continuously. In addition, multi-fault tolerance is difficult, so similar to fault diagnosis, a combination of control strategies can be considered.

Break through the limitations of methods and equipment
The computing power of hardware equipment and the precision of acquisition equipment restrict the application of artificial intelligence methods, and multi-directional platform construction research is needed in diagnosis and control.

CONCLUSIONS
Fuel cells are highly energy efficiency and environmentally friendly, which gives them great advantages in automotive applications. But its structure and mechanism are complex, and a lot of factors affect the stable operation, which leads to the high fault rate. In order to improve the stability, reliable diagnosis and fault-tolerance control are important task.
This paper summarizes the research on fault diagnosis and fault-tolerant control of fuel cell, including the conclusion of fault classification methods and application scenarios. Three kinds of common fault diagnosis methods are described, and the characteristics and application of them are compared according to the current technical status. Meanwhile, the cases of active fault-tolerant control are summarized, introducing the methods and key steps, and the limitations and difficulties are discussed.