A Survey on Fault Tolerant and Diagnostic Techniques of Multilevel Inverter

Multilevel inverters are used in a number of high-power applications, including industrial motor drives, flexible power transmission, and, renewable energy systems. Since the MLI design incorporates a large number of devices, the system’s reliability suffers. This has caught the interest of academics all over the world, who are searching for methods to increase the dependability of the MLI design and include fault-tolerant qualities into it, either through software or hardware. Redundancy is a method that has recently gained popularity in solid-state transformer applications where fault tolerance is a highly desired quality. This is because the method provides for greater fault tolerance. The goal of this research is to offer a complete knowledge of the fault tolerance capabilities of MLIs created by the research community over the last few years

The associate editor coordinating the review of this manuscript and approving it for publication was Lorenzo Ciani .

I. INTRODUCTION
Multilevel Inverters are gaining widespread acceptance in various sectors of the power industry, such as renewable energy systems, flexible power transmission, adjustable speed drives, hybrid electric vehicles, etc. High-power MLI topologies comprise a large number of power semiconductor switches and electrolytic capacitors, which, as noted in [1] and [2], are highly susceptible to faults. Semiconductor switches are primarily vulnerable to open circuit (OC) failure, short circuit (SC) failure, and intermittent gate misfiring failure. SC failures are particularly severe and can cause the entire application to shut down, so the diagnosis time for them should be minimized. While OC and gate misfiring failures do not pose as much of a threat to the system, they do result in increased distortions in the output waveform. As a result, a subsequent corrective action, either software or hardwarebased, is taken to restore the post-fault behavior of the system as closely as possible to the pre-fault behavior. This has generated a lot of interest in incorporating fault-tolerant characteristics into MLIs. The capacitor faults are out of the scope of this paper.
In recent years, a substantial amount of research on fault tolerance has adopted an application-oriented approach for power electronic traction transformers [3], permanent magnet synchronous motor (PMSM) drives [4], [5], induction motor drives [6], [7], [8], and brushless direct current (BLDC) motor drives [9], among others. Various modifications to conventional MLI topologies, such as cascaded-H bridge (CHB), neutral point clamped (NPC), and floating capacitor (FC), have been proposed [10] in order to convey faulttolerant properties. Numerous studies have examined the fault-tolerant properties of the following topologies: modular multilevel converter (MMC), cascaded multilevel inverters (CMLIs), T-type inverters, active neutral-point clamped (ANPC) inverters, boost inverters, etc. [11], [12]. MMC and CMLI take advantage of their modularity to add redundant modules to the original topology to enable fault-tolerant operation. Neutral path switches impart inherent fault-tolerance to T-type and ANPC converters. Capacitor voltage unbalance in static synchronous compensator (STATCOM) or disparity of voltage generation in photovoltaic (PV) applications may occur under defective conditions. Thus, a topological perspective of fault-tolerant operation is intertwined with an application-specific modification of the modulation strategy.
Reliability prediction metrics are used to mathematically predict the life of a power electronics system. These metrics can be defined at either the component or the system level. Component-based reliability research mainly focuses on various factors that affect component reliability, which are documented in MIL-HDBK-217F [13]. System-level reliability analysis considers the effect of component interdependency and standby redundancy [14]. The Markov-chain model is widely accepted for evaluating system-level reliability [15], [16]. The main goal of reliability evaluation is to present a clear picture of component and system-based factors to provide for design trade-offs and to ensure the mathematical model meets the expected requirements.
Several factors need to be considered to ensure a highly reliable MLI architecture. Fault detection, localization, and isolation are intertwined procedures that must have a quick response time to eliminate any threat of collateral damage to the system. Recently, research has focused on developing fault detection techniques, with or without the addition of hardware components. However, this paper is focused solely on fault-tolerant schemes, which can be based on software, hardware, or a combination of both. The paper presents a quantitative and qualitative comparison of various techniques, giving readers insight into the state-of-the-art faulttolerant techniques for MLIs. This paper aims to provide a comprehensive review of recent fault-tolerant topologies of MLIs published in the past few years, with respect to multiple facets, including the type of fault, application requirements, topological modification and innovation, modulation, and reliability quantification. The following section discusses fault-tolerant strategies falling under different categories, as defined later in the paper. At the end of each subcategory, a brief summary sheds further light on the discussed technique, highlighting its advantages and disadvantages. Finally, the paper discusses the future scope and conclusion of MLIs.

II. INTRODUCTION TO FAULT DETECTION TECHNIQUES
Based on the choice of detection variables, fault detection schemes can be categorized into two types: voltage-based detection and current-based detection methods. Fault detection techniques that do not employ any voltage/current sensor tend to be more economical. In recent years, a lot of research has been done on fault diagnostic techniques. For example, in [17], the fault diagnostic technique is based on the analysis of the normalized Park's current space vector. A maximum power point (MPP) immune fault detection technique is proposed in [18] for grid-connected PV systems. In [19], the fault detection method is based on current detection, and the duty cycles of the switches are calculated using the method proposed by Alesina and Venturini [20]. The diagnosis approach uses the information of the load currents, the angles of the input and output voltage space vectors, and the values of the duty cycles of the switches. No hardware cost in [19] and [21] results in decreased cost. However, these approaches tend to fail under light-load or no-load conditions because the amplitudes of input current and output current are both relatively small, and the detection time is long.
To reduce the detection time, a fault diagnosis method has been proposed in [22], which is realized with the knowledge about the switch state and the current sensor location. In [22], the current sensors are moved ahead of the clamp circuit connection to sample the matrix converter currents during the zero vectors. In [23], a predictive control method is used to improve the reliability of the matrix converter. However, it is difficult to diagnose the open-circuit fault switch when the following two conditions are satisfied simultaneously: 1) the frequency of the load current is the same as the frequency of the input voltage, and 2) the phase difference between the input voltage and the output current is π. An additional current sensor is also needed to monitor the clamp current.
In [24], a modified space vector modulation technique is proposed as a general solution for the post-fault operation of MLC with the assumption of using independent power sources in the dc side of the converter. The proposed modulation strategy treats the multilevel converter as a two-level converter by introducing an offset vector to adjust the modulation of the converter online under different fault conditions. In [25], LS-PWM along with carrier rotation results in the improved power distribution. Based on current direction, either the switch or its anti-parallel diode (APD) will operate at a particular instant. Under single switch failure (and assuming its corresponding APD in healthy condition), the three-phase modulating signals are maintained at 120 degrees apart from each other when the current flows through the APD, whereas for the reverse direction of current, the modulating signals are maintained at 60 degrees apart for the remaining two-phase modulating signals [26]. In [27] and [28], fault detection techniques have been proposed for DC-DC converters and T-type rectifiers, respectively.

III. FAULT TOLERANT MULTILEVEL INVERTERS
In recent literature, a variety of fault-tolerant MLI topologies have been proposed, which can be categorized primarily as follows: Figure 1 depicts switch level, limb level, and module level.
Two additional classifications exist for the switch-level fault-tolerant switching scheme (FT-SS) method. In the redundant switching states category, it is possible to accomplish the same output voltage level by implementing available switching states for a given voltage vector. This scheme is prevalent for three-phase systems in which the interior voltage vectors possess inherent redundancy. However, few literatures based on single-phase systems have utilised redundant switching states FT-SS to partially or completely recover the pre-fault waveform characteristics during fault conditions. The second solution, DC-bus midpoint connection, refers to FT-SS in which the faulty limb of a three-phase system is irrevocably connected to the DC bus's midpoint after the occurrence of a fault. This solution of powering a three-phase system with its two remaining functional phases will result in a balanced output with a reduced rating.
In the proposed leg-level solution, a fourth redundant limb is added to a three-phase system. The redundant leg can contribute to the improvement of a particular characteristic under healthy conditions, or it can be activated only under defective conditions. The redundant limb of a single-phase system is primarily activated under fault conditions. The category at the module level is further subdivided into three categories. In the neutral shift method, the phase angle between the unbalanced phase voltages is shifted in order to produce a higher and balanced line voltage. Therefore, this technique only applies to three-phase systems. In addition, the neutral shift method cannot attain pre-fault conditions when a defect occurs. The subsequent category, DC-bus voltage regulation, attempts to achieve the pre-fault characteristics by enhancing the DC-side of the inverter, which is accomplished by employing a Z-source converter prior to the inverter topology. By simultaneously implementing neutral shift FT-SS, the increased voltage stress resulting from DC-bus voltage regulation FT-SS can be distributed equitably among the three phases. In the final category, redundant module, additional modules are added in series to the original topology, and when a defect occurs, the redundant module is used to replace the faulty module. Following is an in-depth analysis of each of the aforementioned categories.
A. REDUNDANT SWITCHING STATES Figure 2a depicts the space vector diagram (SVD) for a three-phase, three-level system with a DC-link input made up of two capacitors. Figure 2a demonstrates that the switching states represented by 012 in the SVD indicate that the output terminal of phase A is connected to terminal 0, the output terminal of phase B is connected to terminal 1, and the output terminal of phase C is connected to terminal 2. There are 19 unique space vectors available out of a total of 27 transition states, with the remainder being redundant space 60868 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. vectors accessible at zero and low voltage vectors. The ratio of the number of transition states to the number of output voltage levels is proportional. The SVD of a three-phase, four-level inverter is depicted in Figure 2b. Only 37 unique vectors exist out of a total of 64 transitioning states, according to [29]. Consequently, the number of redundant switching states has multiplied along with the number of output voltage levels. This redundancy is beneficial when its complementary switching state cannot be synthesised due to defective conditions. Figure 3a and 3b show the three-phase three-level topology of conventional T-type and Active Neutral Point Clamped (ANPC) inverters, respectively. When a fault occurs on the half-bridge switches of the T-type or ANPC topology, there are improbable large voltage vectors, and in order to obtain a balanced output, the modulation index value needs to be reduced so that the reference vector lies inside the inner hexagon. The redundancy available for the small and medium voltage vectors is utilized to obtain a balanced output, which, however, is 50% less than the pre-fault conditions. In [30] and [31], the dwell time of the synthesizable small voltage vectors is rearranged in such a way as to replace their corresponding non-synthesizable vector in case of a fault on the half-bridge  switches for the T-type inverter. The steps involved in the proposed scheme [30], [31] are as follows: • Reduce the modulation index such that the reference vector lies in the inner hexagon.
• Depending on the faulty switch location, the lowest turn-on time T low , is added or subtracted from the three-phase turn-on times (T a , T b , T c ). If two nonsynthesizable switching states exist in step 2, step 3 is required else should be skipped.
• Redefine the turn-on time to eliminate the remaining non-synthesizable switching state.
• Ensure equal dwell time for both '0' and '2'switching states which aim to achieve capacitor voltage balancing.   Figure 4 shows the SVD for a three-phase three-level topology under OC failure where the fault is considered on switch S a1 in T-type inverter, which leads to the non-synthetization of those switching states where the phase A is connected to the 2 terminal (see figure 3a). The non-synthesizable vectors are represented in red boxes, whereas the inner hexagon is marked in green color. Table 1 shows the steps 2 and 3 of the proposed scheme in [30] and [31], being dealt in detail. The voltage reference vector is considered to be in subsector VIb, and Ib for the considered case.
The same principle of adding or subtracting the minimum turn-on time to replace impossible switching states with redundant ones is applied to the three-phase three-level NPC topology in [32]. In the ANPC converter, a fault on the neutral path switches transforms the topology to a conventional NPC topology, resulting in a balanced three-level output waveform with the same pre-fault output rating [33]. On the other hand, for a fault on the neutral path switches in a T-type inverter, three different fault-tolerant switching schemes (FT-SS) are compared in Table 2 (considering a fault on switch Sa3, as seen in Figure 3a). Only the switching states synthesized in the particular FT-SS are represented in the SVD in Table 2.
In [34], depending on the faulty switch location, a compensation voltage is added to the reference voltage waveforms for three-level rectifiers to achieve fault-tolerant control.
In the single-phase system presented in [35], the proposed topology is capable of achieving partial fault-tolerant characteristics due to the FT-SS redundant switching states for generating the intermediate voltage level of '0.5Vdc', which can be generated by either capacitor (as shown in Figure 5a). However, the output voltage obtained under faulty conditions is only half of the pre-fault conditions. This disadvantage is addressed in [36], in which a transformer is utilized before the load to recover the output power rating (as shown in Figure 5b). However, the inner switches in the NPC leg (S2 and S3) are unable to tolerate an OC failure across them since it would disconnect the load from the inverter topology.
As a summary of the redundant switching state FT-SS, the following are mentioned: • As the number of capacitors parallel to the DC-link increases, so does the number of levels in the output voltage waveform, leading to an increase in redundant switching states for the interior voltage vectors.
60870 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.  • In a single phase or three phase system, if the FT-SS is exclusively software based, the output voltage under fault conditions is halved compared to pre-fault conditions.
• However, if a hardware implementation along with redundant switching state FT-SS is performed, the output voltage obtained under defective conditions can be restored to its initial value.
• Depending on the selection of FT-SS, certain characteristics, including input current ripple, DC link voltage ripple, and efficiency, may be impacted under defective conditions.

B. DC-BUS CONNECTION VIA MID-POINT
In this configuration, the output voltage of the defective phase is connected to the DC-link's neutral point, resulting in the defective phase outputting a single voltage level. The remaining two healthy phases provide reduced voltage to a three-phase load. In [33], when a malfunction occurs on the half-bridge switches in the ANPC converter (such as switch Sa1 in figure 3b), phase-A's output is permanently connected to the neutral point (labelled '1' in figure 3b), whereas phase-B and phase-C provide a three-level output voltage. As a result, the converter operates in a reconfiguration mode of 1L-3L-3L. Reference [37] describes an analogous procedure for a conventional three-phase T-type inverter. Adjustments must be made to the reference waveforms to accommodate the 1L-3L-3L reconfiguration mode. Before the fault, equations 1 to 3 represent the reference voltage waveforms, while equations 4 to 6 represent the waveforms after the defect. The primary disadvantage of the DC-link midpoint connection scheme is that defective conditions reduce the capacity of the inverter by half.
A two-level three-leg inverter is used to drive an induction motor load in [38] (see Figure 6a). The load neutral is connected to the neutral point of the DC-link capacitor in VOLUME 11, 2023 60871 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. an attempt to provide a path for the flow of zero-sequence current. At times of fault in one phase, the motor is maintained in balanced operation by the remaining two healthy phases. However, this scheme is only applicable when the midpoint of the DC-link capacitors is accessible. Also, since the phase currents flow into the capacitors under faulty conditions, oversized capacitors become an unavoidable issue.
In [7], the faulty phase is connected to the DC-link neutral point by triggering the corresponding TRIAC (e.g., TRIAC ta is triggered when a fault occurs on Sa1 or Sa2, see Figure 6b). The fault-tolerant control strategy implemented in [7] is predictive torque control (PTC).
A hardware-based fault-tolerant scheme is proposed in [39] for a single-phase conventional NPC topology (see Figure 7). In the proposed solution, bidirectional switches are employed to ensure that the NPC leg is clamped to the capacitor leg under faulty conditions, which ensures protection against SC failure. The proposed topology is able to preserve full output power with decreased quality waveforms. Fig. 8 represents the schematic diagram of a three-phase matrix converter (MC) consisting of nine bidirectional switches, which enables bidirectional power flow. An inputside filter can be added to avoid higher order distortions in the source current. The switching devices of the MC are protected from overvoltage and overcurrent by the clamp circuit. In [40], the proposed FT-SS considers the MC as a two-stage rectifier and inverter circuit (see Figure 9), and recomposes the control of the rectifier stage to counteract  the fault [40]. The same fault tolerant method is also used to counteract the double open-switch fault in [41].
These techniques, although not requiring any additional hardware or circuit reconfiguration, do not optimize the duty cycles of the chosen switching states to reduce error in the converter's synthesized output voltages. As a result, they do not provide the MC with the best possible performance after an open-switch failure.
In the proposed fault tolerant approach in [19], the nine bidirectional switches are divided into three groups based  on the correlation between their duty cycles and phase discrepancies. Initially, the duty cycles are estimated using the approach suggested by Alesina and Venturini [20], and a non-synthesizable zero vector is replaced with a synthesizable one in the duty cycles. The duty cycles of the two remaining healthy switches in the affected phase are then changed in accordance with the converter's operational circumstances and the moment in time. The fault tolerant approach suggested in [23] trades off output current quality and continuity in the presence of faults, as specified in the flowchart in figure 10. For the remaining eight switches, the cost function is assessed, and the switching state is chosen based on the cost function's minimal value.
In [42], a fault-tolerant method is presented that achieves an optimized duty cycle for the remaining healthy switches in a multilevel inverter. This optimization is achieved by formulating a nonlinear optimization problem, which aims to minimize the error between the desired reference voltages and the voltages generated by the switching states. The optimization problem takes into account the Karush-Kuhn-Tucker condition [43] to ensure the solution satisfies the necessary constraints. By utilizing this fault-tolerant method, the inverter can continue to operate efficiently and effectively even in the presence of faults. The optimization process adjusts the duty cycles of the functioning switches to compensate for the faulty switches, allowing the inverter to maintain accurate output voltages as close as possible to the desired reference voltages. To summarize the 'DC-bus midpoint connection' FT-SS, following points are mentioned as below: • Under faulty conditions on half bridge switches, the phase containing faulty switch is connected to the DC-link neutral point. This results in the 1L-3L-3L reconfiguration mode.
• The reference waveforms need to be readjusted in order to obtain a balanced output, which however is 50% reduced when compared to the pre-fault value.
• If the neutral point of the load is connected to the DC-link neutral point, the phase current will flow into the capacitors. Thus, over sizing of the capacitors is an inevitable disadvantage.
• Upon occurrence of single switch failure in MC topology, the remaining eight switches are modified in such a way so as to counteract the fault by considering MC as a two stage rectifier and inverter circuit.

C. REDUNDANT LEG CONNECTION
In this method for achieving fault-tolerant characteristics, a redundant leg, which may be identical or distinct from the phase legs of the original circuit, is implemented. This category is subdivided into single-phase and three-phase redundant legs.

1) REDUNDANT LIMB APPROACH FOR SINGLE-PHASE SYSTEMS
The addition of TRIACs [44], semiconductor switches (single [45] or bidirectional switches [46], [47]), relays [48], [49], [50], or transformers [51] to incorporate fault-tolerant characteristics into the main inverter topology is a recent trend for single-phase MLI topologies. To evaluate the robustness of recently proposed redundant leg-based singlephase MLI fault-tolerant topologies, the following factors are considered: • Types and location of fault: According to a study on IGBT power failure in power electronic converter systems, semiconductor switches are the second most fragile components in a converter system after the capacitors [2]. Hence, in the entire length of the paper, only switches are considered to develop fault. Failure in semiconductor switches can be mainly categorized into two subcategories, open circuit (OC) failure and short circuit (SC) failure. Under OC failure, the defected switch does not allow certain output voltage levels to generate, hence the waveform will be distorted. Whereas, on the other hand, severity of SC failure is higher since it might lead to shorting of the voltage source or increased voltage stress across the remaining healthy switches. The severity of a fault is also dependent on its location and number of failed switches. Thus, the fault tolerant topology should be able to match the pre-fault waveform characteristics as closely as possible, under all fault locations and to both single and multiple switch failure.
• Post fault performance factor (PFPF): PFPF is defined as the ratio of the post-fault inverter output to the pre-fault inverter output power [1]. A value equal to 1.0 implies that the inverter is capable of preserving the output rating of the inverter, which can be achieved with the same or reduced number of levels in the post-fault output voltage waveform when compared to the number of levels in the pre-fault output voltage waveform.
• Efficiency after failure: After the fault has occurred, the number of switches conducting should be either less or equal to the number of switches conducting before the fault inception. This ensures that the efficiency of the power inverter has not drastically decreased upon the occurrence of fault (provided the PFPF value is equal to 1.0).
• Total Blocking Voltage Ratio (TBVR): The total blocking voltage (TBV) of a multilevel inverter refers to the maximum voltage that the inverter's switches must be able to withstand when they are turned off. It is an important consideration for the design and cost of multilevel inverters. The TBV of a multilevel inverter is determined by the number of levels and the voltage rating of each level. As the number of levels increases, the TBV also increases. The TBV requirements can impact the cost of the inverter to some extent, as higher TBV ratings typically require the use of more robust and expensive switches. Therefore, in order to bring an approximate comparison of cost, all the single-phase fault-tolerant topologies are compared with the conventional cascaded H-bridge (CHB) inverter. TBVR is defined as the ratio of total blocking voltage of the fault tolerant inverter to the total blocking voltage of the standard CHB inverter with a redundant leg. Note that while evaluating the TBVR for a particular topology, the former is compared with the CHB inverter which generates an equal number of output voltage levels. High TBVR means the approximate cost will be high among the fault tolerant MLI. Table 3 brings out an effective comparison of the recently proposed fault tolerant single-phase topologies with redundant leg.

2) REDUNDANT LEG APPROACH IN THREE-PHASE TOPOLOGIES
In [54] and [55], fault-tolerant operation of a conventional NPC converter is discussed. The fault tolerant characteristics are incorporated by adding a fourth leg, a conventional threelevel FC leg, which provides a stiff neutral point (under healthy conditions) for the three-phase three-level NPC topology. There are three solutions proposed, the first two of which are partial and the third is the complete solution. However, the third solution has a higher cost and increased number of components as its main disadvantage. In order to blow open the fuse, thyristors are utilized in [56] and [57]. In [56], the thyristors permanently connect the output terminal of the faulty phase to the virtual neutral point (VNP) (see figure 11). This particular reconfiguration process is only achievable if the fault in the switching devices does not result in wire bonding fusion. This characteristic is achievable if controlled fuses are employed in the topology [58]. In [57], capacitors C P and C N are employed to avoid short circuit of voltage supply when the thyristors are triggered to explode the fuse. Under faulty conditions, the redundant leg provides an active neutral point by activating the corresponding TRIACs (TA1 and TA2 are activated to connect the midpoint of the redundant leg to the NP, see figure 12). However, the output voltage in [56] and [57] is reduced post-thyristor triggering, which requires modification of the modulating signals to obtain rated output voltage. In [59], [60], and [61], a symmetrical redundant leg is utilized to provide fault-tolerant characteristics to the T-type, ANPC, and matrix inverter, respectively. These topologies have the capability of overcurrent sharing under healthy conditions. It's important to note that the overcurrent sharing characteristic is only possible at large space vectors. Hence, the overcurrent sharing mode would be discontinuous for a three-level inverter. However, the characteristic of overcurrent sharing can be further improved if the switches in the redundant leg are made of wide-bandgap semiconductor materials (e.g., silicon carbide) [59], [46]. Due to the impartial path transitions, active NPC (ANPC) converters are inherently more reliable than NPC converters. If output power reduction is permitted, ANPC converters have 10% greater reliability for single-switch SC failure and 8.5% greater reliability for OC than NPC converters [55]. However, if output power loss is not permitted, both converters are equally reliable. In [9], a three-phase two-leg rectifier is used along with an identical two-level leg to incorporate fault tolerance in the system. Table 4 presents an effective comparison of the threephase fault-tolerant topologies with a redundant leg.    60876 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
To summarize, the redundant leg approach has the following main features: • The redundant leg can remain non-operational under healthy conditions (as is the case with the discussed topologies under the single-phase redundant leg approach) or can operate under healthy conditions (as is the case with the discussed topologies under the three-phase redundant leg approach) to deliver extra benefits to the original topology, such as providing an active neutral point and overload current sharing. However, this leads to increased conduction losses during healthy conditions.
• The common characteristic of redundant leg operation for a three-phase topology is that the redundant leg can be used to share the overload current. This characteristic can be further aided if the redundant leg switches are made of wideband gap semiconductor material. However, current sharing is discontinuous since it occurs only at large space vectors or else the topology should be modulated as a two-level inverter instead of a three-level inverter to obtain continuous current sharing.
• The topology of the redundant leg in the three-phase system can be the same or different from the phase legs of the original topology, depending on the specific requirements. The major disadvantage of the redundant leg approach is that a whole particular leg is disconnected even when the fault has occurred on a single switch. Thus, for high voltage applications which have large number of switches in each leg, the redundant leg approach may not seem feasible. As a solution to that, recent literature reports the use of redundancy introduced to such individual switches in the proposed topology whose healthy operation is necessary for providing a balanced output. As shown in figure 13, in [10], each switch in the FC leg has a redundant path to ensure continuous conductance under OC failure conditions. In multilevel active clamped (MAC) converter [62], fault on inner switches can be tolerated owing to the redundant paths available for producing the particular output voltage level, whereas the outermost switches are made fault tolerant by employing parallel redundant switch as shown in figure 14.

D. NEUTRAL SHIFT TECHNIQUE
A cascaded multilevel (CMC) inverter defect may cause an uneven number of modules to be active in each phase. This section's goal is to examine the methods that have been put out in order to provide balanced line-to-line voltage without adding any hardware components, as well as its benefits and drawbacks. Figure 15a shows the fasorial diagram of an 11-level CMC where five healthy modules are assumed to be present in each phase (5-5-5 configuration mode, where the number denotes the healthy modules in each phase). The fundamental way to achieve symmetry in the original topology in the event of a fault-let's say, two defective modules in phase-Care-is to remove two healthy modules from both the A  and B phases (see figure 10b). However, since more energy might have been drawn from the healthy modules that were skipped, this underutilizes the system's components [63]. Fundamental phase shift compensation (FPSC), also known as the neutral-shift method, is an idea put forth by authors in [64]. It calls for changing the fundamental phase angle between the output phase voltages in order to achieve balanced line-to-line voltage, i.e., | V AB |=| V BC |=| V CA |. This idea is mathematically represented in equations (7)-(9), where V A , V B , and, V C are the magnitudes of the three phase voltages, and θ AB , θ BC , and, θ CA represents angle in between the three phase voltages This results in θ AB = 95 • , θ BC = 133 • , and θ CA = 132 • when equation (9) for two defective modules in phase-C is evaluated offline (see figure 15c). When using the ''neutral shift'' approach, a larger line-to-line voltage is possible than with the prior solution (see figure 15b).
However, the following list includes FPSC's key drawbacks: • When compared to the pre-fault value (see figure 15a), the obtained line-to-line voltage (see figure 15c) is lower.
• There is a chance that at least one phase may experience reverse power flow, depending on the load power factor [65].
• When the converter neutral point is outside of the line to line voltage, there may be more than one remedy for a specific problematic situation, and FPSC may not provide the best option (see figure 15d). In order to establish a phase difference as near to 180 degrees as feasible, the expanded FPSC approach [66] suggests lowering the phase voltage of the converter with the greatest number of healthy modules (see figure 15e). Compared to the straightforward FPSC methodology, this method produces superior outcomes (see figure 15d). Even the extended FPSC approach, however, is unable to restore the pre-fault voltage rating. Even if common mode voltage does not show in the line voltages as a result of FPSC installation, excessive voltage stress may nevertheless damage and age motor bearings and shafts [67].
The authors of [68] concentrated on lowering the amount of common mode component that was fed into the inverter phase voltages. After the adoption of FPSC, the phase voltages are no longer 120 degrees apart, making natural removal of the third-order harmonics from the phase voltages impossible. In [69], selective harmonic elimination using FPSC is used to get balanced line voltages and naturally remove the third-order harmonic from phase voltages. When compared to conventional NS approaches, the space vector modulation (SVM) methodology used in [70] is able to achieve greater line voltage when there are faults. Reference [25] uses a modified level shift-pulse width modulation approach to perform the FPSC technology, rotating the carrier waveforms to create a balanced power distribution across the H-bridge cells.
The following elements are mentioned to summarize the FT-SS 'neutral shift' scheme: • The primary objective of this scheme is to achieve balanced line-to-line voltage by adjusting the phase voltage angles.
• The 'neutral shift' FT-SS scheme results in higher lineto-line voltage than the solution where an equal number of healthy modules are bypassed to maintain symmetry in the three-phase voltages.
• However, the pre-fault rating is not obtained after implementing the 'neutral shift' scheme.
• Since the nature of phase voltages remains unbalanced, natural elimination of third-order harmonics, which was achieved under healthy conditions, cannot be accomplished.

E. DC BUS VOLTAGE RECONFIGURATION
Z-source inverters (ZSIs) are used in this kind of FT-SS to provide the inverter fault-tolerant properties. ZSIs, which get their name from having two inductors and two capacitors coupled in the shape of a Z, were initially presented in [69]. Active state and zero state are the two operational states of an inverter. By implementing ZSI between the DC link and VSI, inherent fault-tolerant qualities to short-circuit failure happening on the VSI side are achieved. ZSI uses the zero state of conventional voltage source inverters (VSIs) to boost up the DC link voltage. The following methods may be used to meet the output voltage characteristics in the event of an open circuit on the inverter side: • Increasing the voltage of the DC-link side inverter [70]: This technique alters the duty ratio of the shoot-through state, or the inverter's zero state, to raise the input voltage to the inverter. However, using this approach always causes the voltage stress on the remaining functional switches to grow. Additionally, the input current's strength grows, endangering the ZSI-side inductor. The modulating waveforms are altered in accordance with the definitions given previously in equation (1) to produce a balanced line-to-line voltage.
• Modulation index, shoot-through mode duty ratio, and Fundamental Phase Shift Compensation (FPSC) [71]: The output voltage magnitude drops when an inverter side fault occurs. The pre-fault voltage output is produced by increasing the modulation index value and changing the shoot-through mode appropriately. When FPSC is used, the remaining healthy switches get an evenly distributed rise in voltage stress. The ZSIs also provide filtering properties in addition to fault-tolerant ones. For applications like a battery-integrated PV system, the input current discontinuity issue of standard ZSIs is undesirable [72]. As a result, quasi-ZSI (qZSI) was created [73]. Table 5 is a comparative examination of a few of the ZSIs.
The items listed below are a summary of the ''DC-bus voltage regulation'' FT-SS: • This specific hardware-based strategy necessitates the employment of passive parts before the converter structure.
• In the case of a failure, the voltage stress on the remaining components rises, hence higher rated power electronic components should be employed.
• The system is capable of matching pre-fault characteristics with lower output quality.
• The ''DC bus voltage regulation'' and ''neutral shift'' schemes may be used together to distribute the increased voltage stress among the three phases.  [80]. Figure 16(b) is a schematic rep- resentation of an SM, which consists of two switches and one flying capacitor. Despite the fact that MMC employs a large number of power electronic devices, which may result in a high failure probability [81], their modularity endows MMC with inherent fault-tolerant properties [82]. The reserved SMs in the MMC can remain active or inactive under healthy conditions and are classified into four categories based on their functionality. a: SCHEME 1, COLD RESERVED SM • In this scheme, the reserved SMs are active under deficient circumstances.
• This scheme is ideal for HVDC applications with a large number of SMs [83]. (More than 200 SMs are used in each limb of the MMC in Siemens' Trans Bay Cable project [84]). Thus, the conduction and switching losses associated with reserved SMs are avoided.
• The primary drawback of this scheme is that redundant SMs require a lengthier time to charge their capacitor to rated values in order to function as normal SMs under defective conditions [85]. VOLUME 11, 2023  Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
• Faults in any SM [86], voltage sensor [87], or DC side [88] must be detected within a brief period of time, and a post-fault reconfiguration scheme must be implemented to ensure uninterrupted operation [89], [90]. There is a delay between the isolation of the defective module and the insertion of the reserved module following the detection of a fault. This delay could cause transitional disruptions. The reference PWM waveform is modified in [91] to eliminate any voltage distortion or overcurrent issues that may occur during the transition period.
b: SCHEME 2, HOT RESERVED SM • Under healthy conditions, the reserved SMs are active.
• Under faulty conditions, the faulty SMs are bypassed and an equal number of healthy SMs from the remaining five limbs of the MMC are also bypassed in order to maintain symmetry in the topology.

c: SCHEME 3, HOT RESERVED SM
• Contrary to scheme 2, the reserved SMs are active while the system is operating normally.
• While the system is malfunctioning, an equal number of healthy SMs from the arm that is complementary to the arm with the malfunctioning modules are bypassed in place of the defective SMs.

d: SCHEME 4, HOT RESERVED SM
• Under defective circumstances, only the problematic SMs are bypassed from the original topology, which results in an unbalanced number of functional SMs; • The reserved SMs stay active even under healthy conditions. As a result, the fault-tolerant design should focus on resolving problems with circulating current [92], energy balance [93], and capacitor voltage balancing [94] that result from the asymmetrical operation. A common advantage of all schemes except scheme 4 is that they naturally eliminate odd-order circulating current due to maintained symmetry in the topology at all times. A common advantage of all schemes except scheme 1 is better transient performance and higher component utilization, resulting in decreased voltage stress across capacitors and consequently achieving higher reliable operation. However, they also share the common disadvantage of increased losses under healthy conditions.
In [95], a rotating sliding box controller (RSBC) is employed as a mix of cold and hot reserved schemes, which ensures that the reserved SMs participate under healthy conditions as well, but the total number of active SMs at any instant is not equal to the total number of SMs in an arm. In RSBC, the phase-shift angles depend on the length of the sliding box, while the number of sectors depends on the total number of operating SMs. After detection of a faulty SM, it is bypassed, which reduces the number of sectors without any other noticeable change in the functioning of the sliding box controller. For example, if N is 4 and redundant SMs are equal to 2, then under healthy conditions, the total number of operating SMs in each arm is equal to 6. Thus, the number of sectors is also equal to 6. If the length of the sliding box is considered to be 4 and assuming the fault has occurred on the third SM, figure 17(a) and 17(b) represent the sliding box controller under healthy and faulty conditions, respectively. The only disadvantage of the RSBC technique is that it is only valid if the number of faulty SMs is less than the reserved SMs.
In [96], a cold reserved SM is employed in each arm of the MMC, and the choice of reserved SM is a multilevel modular capacitor clamped converter (MMCCC) [97] (see figure 18). MMCCC facilitates smooth mode transition and provides for multiple switch fault tolerance owing to its capability to provide an increased number of output voltage levels. MMCCC offers better component utilization when compared to the flying capacitor multilevel DC/DC converter [98]. However, the main disadvantage of MMCCC is that the switches S1  and S2 (see figure 18) cannot tolerate OC and SC failures, respectively [99].
Instead of adding any reserved SM in MMC to gain faulttolerant characteristics, an alternative hardware reconfiguration solution is proposed in [100], which suggests placing a varistor in parallel with the switch, as shown in Figure 19. Varistors have a high resistance value when low voltage appears across their terminals and vice versa. When a switch becomes faulty, the flyback effect [47] results in high voltage stress across both the switch and the varistor due to the loss of a conduction path. This loss of conduction path is compensated for by the varistor, which offers less resistance under high voltage stress. However, the solution in [100] fails when the anti-parallel diode corresponding to the faulty switch gets damaged.

2) FAULT TOLERANCE IN CASCADED MULTILEVEL CONVERTERS (CMC)
Just like MMC, cascaded multilevel converters (CMC) (see Figure 20a) are modular in nature and possess inherent fault-tolerant characteristics [101]. Under faulty conditions, the easiest solution is to bypass the faulty module and an equal number of healthy modules from other phases to maintain symmetry. However, this leads to underutilization of the sources and is being replaced by the solution where only the faulty module is eliminated using the bypass switch. The following discussion of CHB-based fault-tolerant strategies focuses on the application point of view.
The isolated voltage source for each module (see Figure 20b) can be obtained from any of the following sources: • A diode rectifier, which is fed by the dedicated secondary of the input transformer [102], or a cascaded transformer MLI [51].
• Batteries. Isolated and low voltage DC-links make the CHB converters best candidate for PV applications. Separate DC links renders the provision of individual MPPT control for each PV module, which leads to the harvesting of maximum output energy. However, due to unequal solar irradiance and variation in temperature, individual MPP of each panel will result in PV mismatch issues [103]. PV mismatch issues can also occur at times of fault in a module belonging to a particular phase [104]. The aim of the fault tolerant scheme for CHB based PV applications is to provide a balanced three phase grid current under unbalanced conditions. In [104], the three phase balanced grid currents are provided by injecting a zero sequence voltage component. Upon fault occurrence, if all the remaining healthy modules are generating equal power, then the direction of zero-sequence current is opposite to the direction of the phase current, the phase being the one which bears the faulty module. Assuming the fault has occurred in one of the SMs in phase A, figure 21 shows the phasor diagram when the remaining healthy modules generate equal power where V ga , V gb , V gc are the three phase grid voltages, I ga , I gb , I gc are the three phase grid currents, and V a , V b , V c are the three phase converter voltages.
Under unequal power generation, the direction of the zero-sequence component depends on the three-phase power generation ratio. The limitation of the zero-sequence component is that it results in an increase in the converter voltages, and if the converter DC-link voltages are low, it can drive the converter into over-modulation mode. Under faulty conditions, the DC-link voltage of H-bridges is used as a degree of freedom to obtain the rated voltage in CHB-based STAT-COM applications [105]. The main focus of the control system in STATCOM applications is to achieve reactive power control and capacitor voltage balancing on the DC-side of the H-bridge under faulty conditions [106]. For CHB-based energy storage applications, the 'neutral shift' technique is implemented to achieve state of charge (SOC) balancing under faulty conditions [107]. In all of these techniques, the zero-sequence component has been added to achieve a particular objective. However, for low power factor (PF) applications, injecting a zero-sequence component leads to reverse real power flow and may cause hazardous conditions when the system is non-regenerative, for example, a diode bridge rectifier feeding the DC-side of the inverter [108].
Several hardware-based fault-tolerant solutions have been proposed for CMCs in recent literature. One such solution proposed in [48] is a relay-based fault-tolerant scheme. The proposed structure includes four relays in each module, and when a fault occurs in a module, its DC source is transferred to the other module through the related relays, where it connects to the existing DC sources in series. By changing the switching patterns, the inverter can continue to function without being affected by the fault. However, this solution suffers from underutilization of the remaining healthy switches in the faulty module, increased voltage stress, and higher conduction losses due to the continuous operation of relays. Another proposed solution in [49] overcomes the aforementioned disadvantage of increased voltage rating requirements. In this solution, the full bridge cells are changed to the half-bridge switches under faulty conditions. However, the main disadvantage of this approach is the requirement of a CHB module for polarity generation, which is not fault-tolerant and the switches in the CHB module have to bear maximum voltage stress. The concept of fault tolerance for a DC-DC converter in [74] is the same as that proposed for a DC-AC converter in [48] and thus suffers from the same disadvantages. Figure 22 shows another solution proposed in [50], which involves using an additional DC source under faulty conditions. For a specific phase, when certain fault conditions occur, a command is given to the corresponding electromechanical relay to inject DC voltage while simultaneously blowing up the corresponding fuse. However, the requirement of an isolated DC source is a major challenge, especially for grid-connected applications such as STATCOM [105]. A similar concept of DC voltage injection is proposed in [109], where power electronic switches replace the relays and fuses. However, this topology also suffers from the same disadvantage of providing an isolated DC source. Reference [110] proposes a fault-tolerant solution for an open-end winding induction motor drive based on isolated DC sources. Reference [111] proposes an SVM strategy for asymmetrical cascaded MLI under faulty conditions, which ensures that no saturation occurs whenever possible by properly choosing the high voltage vector. However, the switching sequences for power cells are defined offline and stored in lookup tables, which is a time-consuming process. In [112], switching sequences are defined online, adding extra complexity to the modulation scheme. These shortcomings are avoided in [113], where the modulation strategy combines the benefits of both space vector and carrier-based approaches. Under faulty conditions, the maximum value of the modulation index is reduced. Thus, to obtain higher amplitude line-toline voltages, the converter must deliberately over-modulate, which may result in converter saturation. The operation of the cascaded MLI under fault conditions on the converter power cells is extended in [114], where the proposed modulation strategy allows for a wider range of modulation index.
To summarize the 'redundant module configuration' FT-SS, the following points are noted: • Redundant modules in MMC can be active or inactive based on the application requirement.
• The main objective of fault tolerance scheme in MMC configuration is to maintain balance in various conditions such as capacitor voltage balancing and handling of circulating current when an uneven number of modules operate in the six arms of the MMC.
• Similar to MMC, fault tolerance schemes in CMC should aim at handling the asymmetric nature to maintain balanced three-phase grid current for grid-connected PV applications, SOC for battery applications, or capacitor voltage balancing in STATCOM based applications.

IV. FUTURE SCOPE
• The fault-tolerant redundant switching states scheme alone is unable to deliver rated output voltage during faulty conditions. Thus, further research is needed to combine it with hardware-based techniques to achieve the rated output voltage.
• Single-phase redundant leg fault-tolerant switching schemes have received less attention in terms of switch sequencing before and after failure, which directly affects the system's efficiency. While some topologies assume that the fuse will blow open to convert a short circuit failure to an open circuit one, further research is required to study the transition from short circuit failure to fuse explosion. Several aspects, such as the voltage stress on remaining healthy switches, require further discussion.
• Unlike three-phase systems, single-phase systems do not utilize redundant legs in healthy conditions. However, potential benefits from such a redundant leg architecture can be explored in single-phase systems.
• The total blocking voltage ratio for three-phase systems can be further improved by adding extra devices to eliminate short circuit failures.
• The implementation of the fundamental phase shift compensation technique has reduced the burden on switches and inductors in Z-source converters. However, further research is required for single-phase systems to develop modulation strategies that can deal with this issue.
• Semiconductor switches and capacitors are the two most vulnerable components in power electronic systems. However, capacitor failure is not widely discussed in the literature, and most topologies only consider single switch open circuit or short circuit failure. Only a few studies discuss multiple switch failures.

V. CONCLUSION
This paper presents a detailed analysis of fault-tolerant MLI topologies proposed in recent literature. The software-based fault-tolerant strategies, such as redundant switching states, DC-link mid-point connection, and neutral shift, are relatively simple to implement and cost-effective. However, these strategies are unable to fully restore the pre-fault waveform characteristics and magnitude. In contrast, hardware-based strategies, like redundant leg and redundant module configuration, are capable of achieving the same pre-fault voltage magnitude even under faulty conditions, making them more suitable for critical applications. Moreover, implementing both software and hardware-based schemes, such as neutral shift and DC bus voltage regulation, together may result in better output waveform characteristics compared to implementing each scheme individually. As a concluding note to this paper, the key points and pitfalls of each fault-tolerant strategy are listed as follows: • The redundant switching states FT-SS do not incur additional costs for implementation, unless a reduction in the output magnitude is unacceptable. The original topology already possesses inherent fault-tolerant characteristics due to the redundancy of inner voltage vectors in the space vector diagram of the three-phase system.
• With the DC bus midpoint connection FT-SS, the phase containing the fault is connected to the midpoint of the DC bus. As a result, a two-phase system drives the original three-phase system with a reduced output rating.
• The redundant leg-based approach involves adding an extra leg to the original three-phase system, which can remain operational or non-operational under healthy conditions. Although this approach increases the cost of the original topology, it meets the pre-fault waveform characteristics. Recent literature has proposed this approach in single-phase systems, where it preserves the output rating under faulty conditions.
• The neutral shift software-based approach modifies the phase angles of unbalanced phase voltages to achieve a higher and balanced line voltage. However, the main drawback is that the obtained line voltage magnitude is lower than the pre-fault line voltage magnitude. The natural third-order harmonic elimination achieved with balanced phase voltages is no longer achievable after implementing the neutral shift scheme.
• With DC bus voltage regulation FT-SS, passive components are used before the conversion/inversion stage to match the pre-fault characteristics with reduced output quality. Implementing DC bus voltage regulation FT-SS with the neutral shift scheme results in the uniform sharing of increased voltage stress on the three phases.
• Redundant module FT-SS involves adding an additional module in series with the original topology, which can remain operational or non-operational under healthy conditions. The main focus of the control system should be on achieving capacitor voltage balancing, real and reactive power control, and providing balanced three-phase grid currents under faulty conditions.