Rethinking Model-Based Fault Detection: Uncertainties, Risks, and Optimization Based on a Multilevel Converter Case Study

This article presents a probabilistic framework for assessing uncertainty and failure risk in model-based fault detection (MBFD) of power electronic systems. The proposed methodology encompasses uncertainty factor selection, uncertainty propagation, risk assessment, sensitivity analysis, and the development of tailored solutions to optimize MBFD performance. By quantifying two types of misdiagnosis, the risk-of-failure of MBFD has been evaluated under diversely random conditions. In a detailed case study on a modular multilevel converter (MMC), the framework has analyzed five different methods and revealed that existing MBFD methods can have misdiagnosis rates up to 20% due to uncertainties. By identifying leading uncertainty factors and mitigating their impacts, we have reduced the misdiagnosis rate to below 0.4%. While the MMC case study exemplifies practical implementation, the framework's generality makes it applicable to optimize fault detection across diverse power electronics applications.


I. INTRODUCTION
W ITH the increasing utilization of power electronic con- verters in safety-critical applications [1], [2], [3], the failure of power electronic systems has become increasingly severe.Timely and robust fault detection is vital to mitigate risks and prevent catastrophic damages.Among various fault detection approaches [3], [4], model-based fault detection (MBFD) techniques are a preferred choice for power electronic applications because of their explicit physical interpretations of fault detection and leveraging existing sensors without additional hardware.
Existing studies of MBFD methods in power electronics often provide validations as having seemly 100% accuracy of detecting faults [4], [5].The inherent uncertainties and associated risks

TABLE I RISK-OF-FAILURE OF FAULT DETECTIONS IN DIFFERENT APPLICATIONS
have not received adequate attention [6].For instance, notable contributions have been focused on advancing fault-detection functionalities [7], [8], [9], [10].The robustness of these aforementioned studies has often been assessed by a few deterministic scenario tests that do not reflect to the stochastic nature of the system states.Besides, no information is provided regarding the risk-of-failure of their proposed methods.As a result, although their validations have revealed good effectiveness in a laboratory condition, their performances exposed to practical conditions are difficult to be guaranteed.
In practical applications, uncertainties significantly affect the performance of MBFD methods.As shown in Fig. 1, the detection of a fault appears straightforward under ideal conditions without uncertainties.However, practical implementations subject to various uncertainties can yield two risk-of-failure outcomes: false alarm and missed alarm.False alarms arise when the MBFD system incorrectly identifies a fault in the absence of any actual fault, resulting in operational disruptions [19].Conversely, missed alarms occur when a fault remains undetected, introducing severe risks [15].As listed in Table I, the risk-of-failure of the fault detection has commonly existed in different applications.For example, an industrial report [11] highlighted numerous false alarms and up to 75% missed alarm in wind applications.From this perspective, failing to adequately assess uncertainties and associated risks renders the direct implementation of an MBFD method potentially hazardous instead of improving the system.
While uncertainties have been studied in design [20], [21] and reliability [2], [22] of power electronics, a noticeable gap regarding the uncertainties of MBFD is lacking an explicit assessment framework.To begin with, many existing MBFD studies [7], [9] tend to rely on a limited number of specific scenarios to conduct robustness testing and evaluate the impact of uncertainties.However, this approach is limited to revealing the full spectrum of issues arising from uncertainties.Furthermore, the aspects of uncertainty propagation and risk assessment in MBFD are often oversimplified, neglecting the coupling effects among different factors [7], [8], [23], [24].As a result, their proposed solutions for mitigating risks associated with uncertainties frequently rely on manual and empirical methods [1], [10], [25], such as adding filters to reduce noise or adjusting fault-detection thresholds simply.A holistic consideration and systematic assessment of MBFD uncertainties are highly demanding.
In this article, we propose a risk-driven probabilistic framework to analyze the impact of multiple uncertainty factors in MBFD.The framework includes uncertainty factor selection, uncertainty propagation, risk assessment, sensitivity analysis, and optimization.Based on a modular multilevel converter (MMC), an existing MBFD method [7] for an insulate-gate bipolar transistor (IGBT) open-circuit fault is considered as a case study.The main contributions of this article are as follows.
1) The proposed uncertainty factor selection is based on explicit understandings of the system.The analysis reveals that the impact of uncertainties on the MBFD has two parts: dc bias and ac fluctuations.Different uncertainties are intricately coupled rather than independent.2) To simultaneously consider multiple coupling effects among different uncertainties, a Monte Carlo analysis is employed.The risk-of-failure of the MBFD is defined as two contradictory error rates.It reveals that the conventional threshold adjustment is limited to improve the MBFD performance concerning these two error rates.
3) The sensitivity analysis enables to identify the leading uncertainty factor of the studied case.The subsequently proposed observer-enhanced method compensates this uncertainty, which simultaneously reduces the aforementioned two error rates of the MBFD method.

II. UNCERTAINTIES IN MBFD
MBFD approaches are based on comparing a measured signal, the actual plant output, with its estimated value derived from an explicit mathematical model of the system (see Fig. 2

top).
Ideally, the difference, termed as the residual, should be zero when the system is healthy and deviate from zero when a fault is present.However, the actual residual of MBFD is inherently influenced by the following two categories of uncertainties.To address this challenge, this article introduces a risk-driven probabilistic framework for assessing uncertainties in MBFD.This method entails identifying potential uncertainties, prioritizing them based on quantified risks, and tackling the most relevant ones to ensure the MBFD at an acceptable risk level.The bottom of Fig. 2 shows a detailed flowchart of this process.Initially, uncertainty sources are identified comprehensively through a model analysis of MBFD.Subsequently, uncertainty propagation and two defined risk-of-failure metrics quantify the impact of each uncertainty on the MBFD performance.This quantification allows for the prioritization of uncertainties, facilitating the management of the most critical risks first.A more explicit case study and its optimization are elaborated upon in the subsequent sections.

III. MOTIVATION CASE STUDY OF THE MBFD AND THE ANALYTICAL IMPACT OF UNCERTAINTIES
In this section, an existing MBFD method [7] for the IGBT open-circuit fault of the MMC is utilized as a motivation case.Compared to the state-of-the-art, Zhou et al. [7] carried out excellent robustness tests to consider the impact of uncertainties.However, how to model uncertainties systematically and quantify their impacts remains unclear yet.

A. Configuration of the MMC and an Existing MBFD Method
A schematic of a three-phase MMC is shown in Fig. 3.Each phase of the MMC has two arms, where each arm consists of N series-connected half-bridge submodules (SMs) and an arm inductor L 0 (the upper and lower arm inductance are also denoted as L p and L n to consider their differences).Take phase a as an example and the subscript is neglected for simplification.The upper and lower arm currents and voltages are denoted as i p , i n , u p , and u n , respectively.The ac side current i g and the circulating current i cir are expressed as (1) Each SM consists of two IGBTs with two antiparallel diodes and a capacitor.The two IGBTs are controlled with complementary gate signals, resulting in two switching states, i.e., insert or bypass.The corresponding output voltage of the SM is denoted as u sm,i , which is expressed as where S i is the binary switching function of the ith SM and u c,i is the corresponding capacitor voltage.
Open-circuit faults of the IGBTs in the MMC are a noteworthy issue due to their severe consequences.Zhou et al. [7] proposed an MBFD approach based on a model predictive control (MPC), as shown in Fig. 3. Given the known switching states by the MPC controller and the existing sensors of the MMC, the opencircuit fault is identified by checking the residual between the measurement and the estimation in the former control cycle.For example, the measured sum of the upper and lower arm voltage u m and corresponding estimation u e can be expressed as ) where i p , i n , U dc , and u c,i are obtained from the sensors of the MMC system, L 0 is a known model parameter, and T s is the sampling period.To enable the residual independent of the SM capacitor voltage, the normalized residual is expressed as In an ideal condition, the residual |ε| is zero if no fault while the one above zero is fault.To consider inevitable uncertainties, a threshold ε th is typically employed and the fault is identified only when the residual satisfies |ε| > ε th .The threshold in [7] has ε th = 0.8.Although their experimental results have validated good effectiveness in the laboratory conditions, whether the empirical selection of the threshold performs a robust MBFD remains unknown.The subsequent part of the section will use an analytical method to analyze the uncertainties at first.

B. Analytical Investigation of the MBFD Uncertainties
To model different uncertainties, an uncertainty factor δ x ∈ R + is defined as where x represents a practical value and x is the parameter without considering uncertainties.Substituting different categories of uncertainties listed in Table II, (3) and ( 4) can be rewritten as Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE II DIFFERENT CATEGORIES OF UNCERTAINTY FACTORS AFFECT THE MBFD
The residual with considering uncertainties is modeled as To understand the uncertainties behind the MBFD's residual, the residual variation between the one under ideal conditions and the one subject to uncertainties is defined as Δ ε , which is expressed as By substituting (1), ( 5), and ( 9) into (10), the residual variation is decomposed into two parts as where Accordingly, the uncertainties of the residual consist of two parts, i.e., the biased part Δ ε_dc and the fluctuated part Δ ε_ac .The bias part is mainly affected by the measurement accuracy δ U dc and δ u c , and the fluctuated part is affected by multiple uncertainty factors, such as δ L p , δ L n , δ i p , δ i n , and δ U dc .In addition to the steady state, the residual uncertainties also depend on the dynamics of U dc and i g , which are related to the operational conditions and the controller.Moreover, other parameters such as N and T s also affect the residual by enhancing or attenuating uncertainties.In the next section, the case study will reveal that Δ ε_ac is prone to false alarms while Δ ε_dc may cause missed alarms due to weakened residual.

IV. UNCERTAINTY QUANTIFICATION BASED ON MONTE CARLO EVALUATION
The analytical model above provides qualitative understandings of the uncertainty sources, however, the analytical model alone is difficult to obtain quantitative results of the uncertainty propagation, especially considering multiple uncertainty sources and operational dynamics simultaneously.To address this problem, this section employs Monte Carlo analysis to evaluate the uncertainty propagation.The nominal parameters of the analyzed MMC system are listed in Table III.

A. Random Variate Generation
According to the aforementioned analysis, the uncertainty sources are mainly from three typical categories, namely parameter mismatch, measurement accuracy, and operational variations.In our previous work [22], we introduced a method for selecting variations based on established boundaries, taking into account manufacturing tolerances, environmental conditions, distribution types, and confidence levels.In this article, the reasonable variations of selected parameters are strictly followed by practical conditions and the existing literature, which are given by the following.
1) Parameter mismatch: Inductance mismatch is selected as ±20% [26] to consider different factors, such as the manufacturing tolerance, the value variations along with temperature, saturation, and degradation effects.Thus, voltage or load in dynamics, U dc can drop from 1 to 0.9 p.u., and i g can change from 1 into −1 p.u. to consider the severe load variations in the operation.The first two categories are independent random factors, which tend to be in a normal distribution according to the central limit theorem.Whereas, the operational variations are not purely random issues, which are regarded as a uniform distribution.Then, 1000 random multiparameter variations within the defined distributions are generated and their combinations are shown in Fig. 4. Two selected cases are marked in the distribution, which will be analyzed later in detail.
An important issue of Monte Carlo analysis is to determine if the number of random parameter combinations is sufficient.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Fig. 4. Distributions of multiple uncertainty factors varying simultaneously.U dc change and i g change emulate the operational dynamics.For instance, U dc change or i g change = 0.9 p.u. means that the dc bus voltage or the ac side current have a step change from 1 p.u. to 0.9 p.u. at t = 0.025 s.The selected Cases I and II will be analyzed later.
The central limit theorem provides a stopping rule [30], which is expressed as where Φ −1 (•) is the inverse Gaussian probability distribution, σ 2 (•) represents the variance, α represents the desired confidence level (α = 0.05 means 95% confidence), Y and Ȳ are the output of Monte Carlo analysis and its average, and n is the total number of Monte Carlo analysis.A higher number of n tends to having a smaller analysis error.The analysis is, thus, stopped if the sample mean error e r falls below a specified threshold.In this work, e r ≤ 1% and α = 0.05 ensure 95% confidence that the analysis gives a relative error of less than 1%.All following provided results have been validated by this criterion.

B. Monte Carlo Analysis
To assess the performance of the MBFD method, two categories of the misdiagnosis risk are defined as Types I and II error rates, which are given by Type I error rate = Pr(|ε| > ε th |Health) Type II error rate = Pr(|ε| ε th |Fault). ( The Type I error rate is computed as the probability where the system is in a health state but the detected outcome is faulty.In analogy, the Type II error rate is defined vice versa.The top of Fig. 5 shows the residual distribution under health and fault states, respectively, and the middle part shows their correspondingly selected waveforms of the two cases.To begin with, when the MMC system is under a health state (i.e., no open-circuit fault), the top left of Fig. 5 shows the residual distribution subject to different uncertainties.Although the residual is largely concentrated within the defined threshold ε th = 0.8, there are 153 samples among the 1000 combinations above the threshold.By using a curve fitting of a generalized extreme value distribution, the Type I error rate is 14.30%, which leads to false alarms.A selected residual waveform of Case I is shown in the middle left of Fig. 5.Although the system does not have any open-circuit faults, the uncertainties cause the residual having a dc bias around 0.1 and an ac fluctuated part around 0.5.A load step change at t = 0.025 s enhances the residual fluctuation beyond the threshold, leading to a false alarm in this case.
On the contrary, when the MMC has an open-circuit fault, the right of Fig. 5 shows the distribution of the residual.Among the 1000 samples, 992 faults have a residual greater than the threshold of 0.8, and only 8 faults are missed in the detection.The Type II error rate is 0.30% in statistics.Similarly, the waveform of a missed alarm (Case II) is also provided.A significant dc bias of around 0.4 exists in the residual, leading to lowering the residual amplitude during the fault and preventing it from triggering the threshold.Thus, the MBFD is unable to detect the fault in this condition.
According to the abovementioned results and (15), it also can be seen that the residual that ε th plays a vital role in the misdiagnosis risk.For example, increasing ε th may be able to reduce the Type I error rate but at the cost of increasing the Type II error rate.Our preliminary effort [31] has utilized the proposed framework to optimize the threshold value selection considering the two contrasting errors.However, simple shifting the threshold value is limited to improve the overall MBFD performance.Prioritizing uncertainty effects and addressing the leading factors are a promising direction to reduce the two error types simultaneously.

C. Sensitivity Analysis
The correlations between residual and uncertainty factors are computed to investigate, which factors are the leading ones.Pearson correlation coefficient ρ XY , as the most common sensitivity analysis method [32] is used in this work.The inputs (X), e.g., uncertainties, are ranked by their influence on the output (Y ), e.g., the residual, which is given by where X and Ȳ are the corresponding average values.The value of ρ XY has a range from 0 to 1, where a larger value indicates a stronger correlation.The bottom of Fig. 5 shows the correlations of the residual in health and fault states, respectively.Combining analytical investigation of ( 12) and ( 13), the following conclusions can be Fig. 7. Comparison of the proposed DOB-enhanced MBFD method and the conventional method under the same uncertainties distribution.From top to bottom: the residual distribution, the waveforms of Cases I and II, and their sensitivity analysis.drawn.First, the arm inductance mismatch significantly affects the residual in health state.This parameter mismatch along with operational step change can enhance the ac fluctuation of the residual and cause false alarms.Next, the measurement accuracy in particular of δ U dc and δ u c plays a vital role in both health or fault states.The dc bias caused by them is more likely to lead to missed alarms since the other uncertainty factors typically tend to magnify the residual.

V. OPTIMIZATION SOLUTION: DISTURBANCE OBSERVER (DOB) ENHANCED MBFD
The aforementioned analysis reveals that inductance mismatch is one of the leading factors affecting MBFD performance, leading to false alarms.By considering this mismatch as epistemic uncertainty, which can be reduced or eliminated, the subsequent research hypothesis is that a more accurate real-time estimation of inductance can enhance the MBFD performance.
In this section, a DOB-based inductance estimation method is proposed.The real-time estimated upper and lower arm inductance Lp and Ln can improve the performance of the MBFD.
The uncertainty quantification is conducted again to compare the performances.

A. DOB-Based Inductance Estimation
Considering the different mismatches of inductances in the upper and lower arms, the voltage equations of upper and lower arms are normalized to be a linearized state-space model, which is described by the following: where x is the system state, u is the control input, y is the measured output, d is regarded as the unknown value of inductance mismatch that needs to be observed, and Φ, Γ, G, and C are known parameter matrices, which are included in Appendix (A1)-(A4).A DOB-based inductance estimation is, thus, given by [33] dk = d1 , d2 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
where d is the estimated value of d, z is the state variable of DOB, and K is the observer gain.The stability of the DOB can be achieved if rank(G) = rank(d).The specific proof can be found in [33].Therefore, the upper and lower arm inductances can be estimated by where L is the estimated inductance values.
According to the proposed DOB-based inductance estimation method, the compensated estimated voltage ûe and the corresponding normalized residual ε can be rewritten as Fig. 6 shows the structure of the DOB-enhanced MBFD method.
The upper and lower arm inductances are real-time estimated, respectively, to consider the individual variations.

B. Uncertainty Quantification of the DOB-Enhanced MBFD
The top of Fig. 7 shows the comparison of the residual distributions under health and fault states, respectively.For the health state, the residual distribution of the DOB-enhanced MBFD becomes more concentrated.For the 1000 Monte Carlo analysis, there is zero false alarm.The statistic Type I error rate is 0.03% only.For the fault state, although the residual distribution of the DOB-enhanced MBFD method becomes more concentrated, the Type II error rate is almost unchanged (i.e., 0.32%).Therefore, the proposed method effectively reduces the false alarm of the MBFD method.
Similarly, the residual waveforms of Cases I and II with the DOB-enhanced method are shown in the middle of Fig. 7.For the health state, the residual fluctuations are significantly suppressed by the proposed method.A further load step change at t = 0.025 s does not cause significant residual variation either.It reveals that mitigation of the inductance parameter mismatch can weaken the effect of operational variation.
The bottom of Fig. 7 shows the correlation coefficient change after the implementation of the DOB-enhanced method.In the health state, the DOB-enhanced method effectively reduces the rank of the inductance mismatch from the dominant factors into the last.The increased coefficients of δ U dc and δ u c reveal that the voltage measurement accuracy becomes more important to further improve the MBFD performance after mitigating the impact of the inductance.However, it may require better sensors, changing hardware, or compensating sensor by modeling efforts [34].In the fault state, the rank of the different uncertainty factors remains almost unchanged after the implementation of the DOB-enhanced method.

VI. EXPERIMENTAL RESULTS
An 8 kVA down-scale MMC prototype has been built for experimental verification, as shown in Fig. 8.The detailed specifications are listed in Table IV.Based on the aforementioned analysis, we focus on mitigating the false alarm problem, which is mainly caused by inductance mismatch.Both steady-state and dynamic conditions are considered.
To begin with, the IGBT switch T 1 of the fourth SM in the upper arm is disabled to emulate the open-circuit fault, as shown in Fig. 9.At the occurrence of the fault, the residual suddenly  false alarms in the four cases.However, the residual can be stabilized after enabling the proposed DOB method.The influence of operational dynamics can be well suppressed by inductance estimation even under load step changes.These behaviors are identical with the results shown in the aforementioned simulation.

B. Uncertainty and Failure Risk Assessment of the Existing Methods
To highlight the impact of uncertainties on MBFD methods of power electronics systems, four additional methods [9], [35], [36], [37] are selected and evaluated based on the proposed framework.The risk-of-failure of these MBFD methods is listed in Table V.While all these studies have demonstrated the effectiveness of their methods under specific simulation or experimental conditions, uncertainties, and the risk of misdiagnosis remain prevalent in MBFD methods of power electronic systems.Some methods exhibit false alarm rates as high as 20%, which can severely disrupt the system's normal operation.Furthermore, from the results in the table, we observe that Type I errors are closely related to parameter mismatches and operational variation, while measurement accuracy significantly impacts Type II errors.The proposed DOB-enhanced MBFD method can mitigate the leading uncertainty factor and improve overall performance considering the two error rates.

VII. CONCLUSION
This article introduces a probabilistic framework for evaluating MBFD performance in power electronic systems.The framework systematically addresses uncertainty factor selection, propagation, risk assessment, sensitivity analysis, and the development of tailored solutions to optimize MBFD.Our investigation highlights several key findings.
1) The impact of uncertainties on MBFD residuals involves both dc bias and ac fluctuation components, with intricate interdependencies among different uncertainty factors.Addressing the leading uncertainty factors is effective to enhance the MBFD performance.2) Employing two quantified error rates, the analysis demonstrates that a well-validated MBFD method from current literature underperforms when subjected to multiple uncertainties.This highlights the limitations of conventional robustness tests based on deterministic scenarios.3) Leveraging the proposed framework, we identify inductance mismatch as a leading factor impairing MBFD performance.We propose a DOB-enhanced MBFD method tailored to mitigate this uncertainty, validated through simulations and experiments demonstrating improved performance under various conditions.4) Importantly, the proposed framework has evaluated six different MBFD methods from the literature and reveals their misdiagnosis rates up to 20%.This evaluation underscores the necessity of applying a probabilistic framework to assess and enhance MBFD methods for practical applications.In conclusion, the proposed probabilistic framework not only identifies significant challenges in current MBFD methods but also provides effective strategies for improving their performances in power electronic systems.
Rethinking Model-Based Fault Detection: Uncertainties, Risks, and Optimization Based on a Multilevel Converter Case Study Yantao Liao and Yi Zhang , Member, IEEE Abstract-This article presents a probabilistic framework for assessing uncertainty and failure risk in model-based fault detection (MBFD) of power electronic systems.The proposed methodology encompasses uncertainty factor selection, uncertainty propagation, risk assessment, sensitivity analysis, and the development of tailored solutions to optimize MBFD performance.By quantifying two types of misdiagnosis, the risk-of-failure of MBFD has been evaluated under diversely random conditions.In a detailed case study on a modular multilevel converter (MMC), the framework has analyzed five different methods and revealed that existing MBFD methods can have misdiagnosis rates up to 20% due to uncertainties.By identifying leading uncertainty factors and mitigating their impacts, we have reduced the misdiagnosis rate to below 0.4%.While the MMC case study exemplifies practical implementation, the framework's generality makes it applicable to optimize fault detection across diverse power electronics applications.Index Terms-Disturbance observer (DOB), fault detection, modular multilevel converters (MMCs), uncertainty quantification.

Fig. 1 .
Fig. 1.Potential outcomes of practical MBFD.The solid black line represents the MBFD's residual under ideal conditions, while the colored line represents the residual with uncertainties.

Fig. 2 .
Fig. 2. MBFD (top) and a risk-driven probabilistic framework to analyze the impact of their uncertainties (bottom).

1 ) 2 )
Aleatory uncertainty: It arises from the inherent variability or randomness in a system, which is often known as stochastic or irreducible uncertainty.Examples include noise in sensor measurements, natural variability in system properties, and random external influences.Epistemic uncertainty: Stemming from incomplete knowledge about the system or potential errors in the model, this is referred to as reducible uncertainty.With providing more information or improving model accuracy, epistemic uncertainty can be minimized.Common examples include inaccuracies in model parameters and model simplifications, and other knowledge gaps.In real-world systems, these two types of uncertainties are often coupled with each other.Traditional methods based on empirical and qualitative analyzes might not fully capture the diverse range of uncertainties or adequately facilitate their prioritization in addressing their impacts.

Fig. 3 .
Fig. 3. Configuration of an MMC and an existing MBFD method [7] for open-circuit fault detection based on MPC.

Fig. 5 .
Fig. 5. Uncertainty quantification of the existing MBFD method under health and fault states.From top to bottom: MBFD's residual distribution, the residual waveforms of Case I or Case II, and their sensitivity analysis.(e r is the sample mean error of the Monte Carlo analysis according to (14), and PDF is probability density function).

Fig. 9 .
Fig. 9. Experiment result of an open-circuit fault detection by the MBFD.

TABLE IV MAIN
CIRCUIT PARAMETERS USED IN EXPERIMENTS

TABLE V UNCERTAINTY
AND FAILURE RISK ASSESSMENT OF STATE OF THE ARTS