Overview on Reliability of Modular Multilevel Cascade Converters

Multi-level converters have been used extensively in modern industry which calls for energy conversion with high-power and highor medium-voltage. Because of its modularity and scalability, the multi-level converter with modular structure can be extended to different voltage levels and has a variety of forms in practical applications. It has attracted much attention from academia in the past decade, however, as a result of the numerous vulnerable power electronics sub-modules, significant challenges remain with regards to reliability. After summarizing the current research status of modular multilevel cascade converters, the main issues of reliability are reviewed in the paper. Firstly, the failure cases are thoroughly surveyed and classified, and the main failure causes are analyzed. Secondly, the reliability evaluation methods are reviewed and applied to the modular multilevel cascade converters. Thirdly, some promising measures to improve the reliability are presented and discussed, including parameter selection, redundancy design, fault-tolerant control and so on. Then, a complete reliability-oriented design procedure for the modular multilevel cascade converters is proposed. Finally, the challenges and opportunities to improve the reliability are concluded.


Introduction
Multilevel converters have attracted increasing attention for more than three decades as one of the preferred choices of electric energy conversion for the high-power applications [1] , such as wind power, marine propulsion, train traction, FACTS and HVDC transmission.Although the development of higher-voltage and higher-current power semiconductor switches (e.g.1the latest-generation device, around 6.5kV and 2.5kA) make the conventional two-level converters capable of driving high power, the multilevel converters still present great advantages due to the fact that they can achieve high power using mature medium-power semiconductor devices [2] and without using devices' series connection technology [3] .Moreover, the high quality of the output voltage waveforms of multilevel converters also makes them very attractive to the industry and academia.Researchers all over the world have done a lot of work to improve the performance of multilevel converters [4][5] .Numerous topologies on the multilevel converters have been proposed for different applications, whose classification is shown in Fig. 1.It can be concluded that multilevel converters belong to the high-power voltage-source converter family, and neutral-point clamped (NPC), flying capacitor (FC) and the cascaded topologies are the classical and the most common ones.Dozens of new topologies have been proposed in literature, and most of them are variations to the three classical topologies, or hybrids between them [1] .
Modular multilevel converter (MMC) was proposed in the early 2000s by Prof. R. Marquardt [6][7] and can be seen as a variation of the cascaded topology.Since it is particularly suitable for the HVDC systems, it immediately made its way into industry.It has been a hot research topic in the past decade and a lot of papers were published on MMC [8][9][10] .Because of modularity and scalability, in theory, the output voltage of MMC can be extended to different voltage levels (e.g.hundreds of levels).Compared with NPC and FC converters, this feature creates minimal harmonics in the output voltage, which makes the use of bulky transformers or filters unnecessary.Therefore, MMC is very appropriate and promising for the high-power applications, including the HVDC transmission [11] , medium voltage motor drive [12] , FACTS [13] , wind turbine generation [14] , large-scale PV grid interface [15] , charge stations [16] , integrated energy storage [17] and so on.
With nearly two decades of research on multilevel converters with the modular structure, a family of topologies has been formed which is based on cascaded connection of multiple half-bridges, full-bridges or other kinds of sub-modules(SMs).Akagi [18] defined this family as the modular multilevel cascade converter (MMCC), including four original family members with the given names: single-star bridge-cells (SSBC), single-delta bridge-cells (SDBC), double-star choppercells (DSCC, or the typical MMC) and double-star bridge-cells (DSBC).In fact, the family members became more and more recently.The triple-star bridge cells (TSBC) [19] , also named as "modular matrix multilevel converter (MMMC)", is composed of 9 branches of full-bridge SMs and suitable for 3Φ-to-3Φ AC-to-AC bidirectional direct power conversion.Aimed at the same function, the modular multilevel converter in hexagonal configuration (also called "hexverter") uses only 6 branches of full-bridge SMs [20] .Additionally, DC-DC modular multilevel converter Fig. 1 Multilevel converter classification [1] with the medium-frequency interlinked transformers are an attractive candidate for HVDC system and the DC-grid [21][22] .In general, these emerging topologies are all based on strings of multiple cascaded SMs, and the strings are referred to "clusters" in this paper.
As to its shortcomings, MMCC always has a large number of SMs per cluster, which leads to some challenges [23] .Firstly, complex control is needed, including SMs' capacitor voltage balancing control and the circulating current control [24] .Secondly, because of the significant fluctuation in the capacitor voltages [25] , bulky capacitors in every SM are necessary, which greatly increases the system cost.Researchers have published a lot of papers dealing with these two problems.Another important challenge for the application of MMCC is reliability.Since numerous power electronics SMs are used, there are many potential failure points, resulting in low reliability of the whole system.On the other hand, MMCC plays an important role in the modern industry, and the cost impact is enormous or even catastrophic when MMCC fails.Therefore, the reliability of MMCC has attracted increasing attention from researchers for schemes [26][27][28][29][30][31][32] aimed to improve the reliability of MMCC, including redundancy design, fault diagnosis, fault-tolerant control and so on.These measures are all aimed to enhance the system reliability through fault-tolerance.For the mass-application of the MMCC, research on the reliability is still in the early stage and requires further work.With this background, the reliability issues of MMCC and current research status are overviewed in the paper.The conventional methodology for the reliability estimation and measures to improve reliability are summarized and applied to MMCC.
The rest of the paper is organized as follows: Section 2 presents a brief review of the latest achieve-ments of MMCC regarding the topologies, SM circuits, and control and modulation schemes.Then, the failure cases of the power electronics converters are surveyed and analyzed in Section 3. Section 4 provides the reliability evaluation methodology for MMCC.Section 4 introduces measures to improve reliability of MMCC from the design phase to the operation phase, including parameter selection, redundancy design, fault diagnosis, fault-tolerant control and reliability-oriented design.Finally, Section 6 presents the challenges and opportunities for the reliability issue of MMCC.

Review of modular multilevel cascade converters
Since the middle of the 1990s, Robicon Corporation has put the multilevel inverter based on the cascaded connection of the modular full-bridge SMs into the medium-voltage motor drive applications [33] .The converter requires a complicated multi-winding phase-shifted transformer, and it can be seen as the earliest multilevel topology with the modular structure.After that, Peng proposed a similar topology for STATCOM by replacing the complicated transformer with distributed floating capacitors [34] .The two are the initial family members of MMCC.After Prof. Marquardt proposed the typical MMC, the age of MMCC really arrives.A large number of papers about MMCC have sprung up to fully understand the working principle, to explore different topologies, different SM circuits, different applications and different control strategies, to build simplified models and to improve the performance.The topologies, SM circuits and control strategies are three main aspects that influence the reliability, which are overviewed.

Different topologies
The typical 3Φ MMC (Fig. 2) has 6 arms or clusters.The clusters are connected to each other by the cluster inductors, which are inserted to suppress the circulating current and the fault current [35] .Every cluster, composed of a string of cascaded half-bridge SMs, can be seen as a controlled voltage source.Through adjusting the clusters' output voltages, the output current and the circulating current can be controlled.Furthermore, the output current can be seen as the differential mode current of two adjacent clusters, and the circulating current as the common mode current, which illustrates that the two currents are independent variables.The output current should be controlled to satisfy the requirement of the given operating condition, while the circulating current is a freedom which can be used to reduce the capacitor voltage fluctuation [36] and balance the capacitors' voltages [37] .These principles are also applicable for other family members of MMCC.
Except for the typical MMC, other family members are also seen to be very promising for practical applications.Four circuit configurations of MMCC are illustrated in Fig. 3 for comparison.Fig. 3(a) shows the classical STATCOM, which is composited by three clusters of full-bridge SMs.Three clusters are named as a group here, shown in the red box.Thus, the typical MMC in Fig. 3(b) includes two group of clusters.When the clusters are made up of half-bridge SMs, this is the most common topology used as DCto-3ΦAC conversion.Otherwise, MMC with full-bridge SMs can be used for either DC-to-3ΦAC or 1ΦACto-3ΦAC conversion [38][39] .The hexverter in Fig. 3(c) and the MMMC in Fig. 3(d) use 2 and 3 groups of clusters respectively, and both with full-bridge SMs.The two are both used for 3ΦAC-to-3ΦAC conversion.The difference is that the 6 clusters of hexverter are assembled head-to-tail, but the 9 clusters of MMMC are distributed as a matrix.Therefore, the control freedom of MMMC is more than the hexverter.There is only one circulating current in the hexverter [20] , but 4 independent circulating currents in the MMMC [19] .Another control freedom for the two topologies is the common mode voltage between the neutral points of the two 3ΦAC systems.Through adjusting the circulating currents and the common mode voltage, the capacitor voltages can be balanced and the fluctuation reduced.
Right now, the topologies in Fig. 3(a) and Fig. 3(b) are quite mature and have been used in commercialized applications, such as STATCOM and HVDC transmission.In contrast, the hexverter and MMMC are still in the research stage.The hexverter, MMMC and back-to-back MMC are all promising candidates for the medium-voltage motor drive.The drawbacks are that they are all limited by the start-up, the lowfrequency or the synchronous operation [40] .
Some alternative topologies of the typical MMC have been proposed in literature, and some examples are shown in Fig. 4. Through the middle-SM, the MMC in Fig. 4(a) can reduce the number of required SMs while producing the same voltage levels [41] .The alternate arm MMC in Fig. 4(b) is a hybrid topology between a two-level converter and the typical MMC [42] , and the hybrid MMC in Fig. 4(c) is a combined topology by 2 two-level converters and the typical MMC [43] .Although these topologies improve performance of the MMC, they sacrifice the modularity likely reducing the reliability of the whole converter [10] .(d) matrix modular multilevel converter [19] (a) (b) (c) Fig. 4 Modified topologies of the typical MMC [10] : (a) middle-SM MMC; (b) alternate arm MMC; (c) hybrid MMC Fig. 5 Components of a MMC 2-level SM showing capacitors, semiconductor switches, bleeder resistors, protection thyristor, bypass switch, and all the signal exchange between the SM and the system-level control system [44] There are a large number of family members of the MMCC.In summary, most family members can be derived by replacing the single switch in the classical converters by a cluster.For example, through replacing all the switches in the 3Φ two-level converter by the clusters, the typical MMC can be obtained, and the MMMC corresponds to the matrix converter.This principle can be used to obtain more practical topologies with the modular structure.As well, hybrid topology with the MMCC and conventional converters arises and has received much attention.Whether these advanced topologies can make their way in real applications is highly dependent on the reliability.

Different SM circuits
Except for the topology, the SMs or the building blocks also have a variety of forms.The basic SM of the typical MMC, a half-bridge converter, is shown in Fig. 5, including capacitors, semiconductor switches, bleeder resistors, protection thyristor, bypass switch, and other components for the signal exchange.Although the simplest half-bridge SM provides the highest efficiency and the lowest cost, it doesn't provide any DC-fault blocking and ride-through capability [44] .
For other kinds of SMs, the peripheral components are similar.The difference is that they have different topologies (Fig. 6), such as the full-bridge SM, asymmetrical double commutated SM, cross-or parallelconnected, clamped-double, the FC and NPC-type SM [45] .In order to optimize the design in terms of DC fault handling capability, efficiency and cost, the hybrid design with the half-bridge and other bipolar SMs can be used, borrowing different features of various SM circuit configurations [46] .Considering a single SM's reliability, the simple half-bridge SM seems the best; but when considering the system reliability and the cost, the hybrid design maybe better.These forms of SMs were firstly proposed for the typical MMC in the HVDC system, and the feasibility and applicability to use them in other MMCC topologies needs to be explored in the future.

Control and modulation schemes of MMCC
The system-level control of MMCC is mature, which is the same as that of the general voltage-source converters.However, the SMs' capacitor voltage balance control is crucial in MMCC, because there are so many SMs.Taking the typical MMC as an example, there are mainly two categories of capacitor voltage balancing control methods (Fig. 7): balancing control prior to the carrier phase shift (CPS) modulation by adjusting the reference [47] ; balancing control by the sorting algorithm after the staircase modulation [48] .The latter is more common for the MMC, but for the cascaded H-bridge converter, the hexverter and MMMC with not that many SMs, the former capacitor voltage balancing method is more practical.Fig. 6 Various SM topologies [45] (a) Balancing control prior to CPS modulation by adjusting reference (b) Capacitor voltage balancing by sorting algorithm after staircase modulation Fig. 7 Capacitor voltage balance control and modulation schemes of MMC [10] (a) (b) Fig. 8 Field experiences of a PV plant [49] : (a) unscheduled maintenance events by component; (b) unscheduled maintenance costs by category The CPS modulation can make the loss evenly distributed among different SMs [47] ; in contrast, the staircase modulation with sorting algorithm brings different switching counts for different SMs, thus different loss distribution.In fact, when neglecting the redundancy setup, the system reliability depends on the most fragile SM.Therefore, the uneven loss distribution is not preferred from the reliability point of view.However, when the SM number is very large, even very low switching frequency can satisfy the operating requirement.Under this condition, the CPS modulation is too complex and unnecessary, and the staircase modulation is preferred.
Circulating current control is another issue, which should be considered.Even order harmonics of the circulating current can be suppressed [24] to reduce the RMS value of the arm currents, and as a result system loss is decreased and reliability increased.On the other hand, certain amount of 2 nd or even 4 th order circulating current can be injected to reduce the capacitor voltage fluctuation [25] , allowing for smaller capacitors.Which arrangement is adopted to maximize the profits depends on the practical situation.In summary, to enhance system reliability, the control and modulation strategies need to be carefully selected according to different applications.

Survey on failure cases of modular multilevel cascade converters
According to five years of field experience in a large, utility-scale generating PV plant [49] , the inverters were most responsible for unscheduled maintenance and the associated cost, as shown in Fig. 8. From this, it can be concluded that in view of the whole system, the power converters are most likely to fail, which is the top priority when considering the system reliability.Focusing on the power converters, the failure cases in literature are collected and classified first in this section.Then the most vulnerable components and their failure modes are analyzed.After that, the major and unique failure causes of MMCC are introduced.

Failure cases and classification
Failures of power converters can be results for many reasons, such as the bad connections between different units, packaging defects, components' damage and harsh operation conditions.The main failure causes are illustrated in Fig. 9 by a fishbone diagram [50] .
Based on the features of the changes when the fault occurs, the failure cases are classified into two categories (Fig. 10): structure variation failures and parameter degradation failures.Structure variations are often referred to the short-term changes caused by over-voltage operation, short-circuit fault or other abnormal manipulations, which are often disastrous and need immediate protection or fault-tolerance.In contrast, parameter degradation is often due to the long-term accumulation of the vibration, the voltage, current or temperature stress, which will deteriorate the performance of the converter gradually.As time goes by, the parameter degradation may develop into the serious structure variations.In Fig. 10, the failure causes marked with the dark gray background are the primary ones, which should be focused on specifically.Fig. 9 Fishbone diagram illustrating the failure causes [50] Fig. 10 Classification of the failure cases of power converters

Vulnerable components and failure modes
At the component level, the semiconductor devices and the capacitors are the most fragile components.This conclusion has been verified by an industrybased survey on the fragile components in power converters, just shown as Fig.11 [51] .
Considering the two most reliability-critical components in the SM of MMCC (e.g.half-bridge SM in Fig. 5), IGBTs and capacitors, the main failure modes are introduced as follows.For IGBT, an overview of IGBT catastrophic failures is shown in Fig. 12, which includes the dominant short-circuit, open-circuit and intermittent gate-misfiring failures [52][53] .In fact, catastrophic failures are difficult to predict and handle, because they are often induced by a single overstress event.Another failure mechanism of IGBT is the wear-out failure.There are so many weak points in the wire-bond IGBT modules, including the wire bond and silicon interconnection, the silicon and the direct copper-bonded (DCB) substrate solder joint, and the DCB substrate and the base plate solder joint [54] .Because different materials have different coefficients of thermal expansion, long-term continuous thermal cycles will cause the wire-bond liftoff or the solder joint fatigue.The third IGBT failure mechanism is the cosmic radiation [55][56] , which is sensitive to the altitude and can be ignored in low-elevation conditions.Generally, the catastrophic failures have attracted a lot of attention Fig. 11 Survey of different fragile components responsible for converter failure in power electronic systems [51] Fig. 12 Overview of IGBT catastrophic failures [52][53][54] Fig. 13 Performance comparisons of the three main types of capacitors [57] and many fault diagnostic and protection methods [52] have been proposed.In contrast, the wear-out failures due to the inappropriate use can clearly shorten IGBT's lifetime, but they have been rarely focused on previously.
For capacitors, three types are often used for DClinks in the power converters: Aluminum Electrolytic Capacitors (Al-caps), Metallized Polypropylene Film Capacitors (MPPF-caps) and Multilayer Ceramic Capacitors (MLC-caps).Performances of the three types of capacitors are compared in Fig. 13, which illustrates that Al-caps has the highest power density and the lowest cost; MPPF-caps provides a wellbalanced performance in terms of ESR, voltage stress, reliability and cost; MLC-caps can operate with higher frequency and higher temperature [57] .For MMCC often with very large capacitor voltage fluctuation, the capacitor banks are always composed of the Al-caps or MPPF-caps [58][59] , and the hybrid design of both Al-caps and MPPF-caps seems very promising in practice.By taking advantage of their different frequency characteristics, the overall reliability and cost can be improved [57] .As well, the dominant failure modes of Al-caps and MPPF-caps are open-circuit, which is good for the fault-tolerance of the capacitor banks.

Failure causes of modular multilevel cascade converters
A large number of power semiconductor devices, capacitors, inductors and other auxiliary components are used in MMCC, and each one can be considered as a potential failure point.Fortunately, because of the modular structure, failure analysis of only one SM can represent all the SMs.Take the half-bridge SM in Fig. 5 as an example, and it contains the following critical components:  IGBT module, which includes the IGBT and diode dies in one package.The press-pack IGBT modules often have better reliability, higher power density and better cooling capability, but higher cost [54] , thus, conventional wire-bond modules are still widely used in MMCC. DC capacitors, including DC-link and snubber capacitors.Both Al-caps and MPPF-caps can be used as the DC-link capacitors, but only MPPFcaps can be used as snubbers. Protection switch, which can be one TRIAC or two anti-paralleled thyristors [60] , and the vacuum breaker is often used as the bypass switch. Bus-bar, which is composed by two conductor layers and a critical insulation layer to bear the high voltage. Gate driver for semiconductor devices, which consumes power and is easily subject to failure, as it needs to have high voltage isolation capability and EMI immunity.The above components are most likely to induce failures in the SMs, which require special consideration in the design phase.And they are also the key points for later reliability estimation.Apart from these critical components in the SMs, the interconnections of different SMs, control system, the signal transmission path and other auxiliary components are also the potential failure causes, but not that likely to induce faults.
When half-bridge SMs are adopted, the DC fault of the typical MMC is one of the major challenges for HVDC applications [45] , which greatly threatens the system reliability.The existing solutions to interrupt and clear the DC-side short-circuit fault include: AC-side circuit breaker, which is not sufficiently fast; DC-side circuit breaker, which isn't mature enough currently; employing the bipolar SMs (e.g.Fig. 6(b)~ (d)), which have DC-side fault blocking capability [61] .The current flow paths in MMC with half-bridge or full-bridge SMs under a DC pole-to-pole fault are illustrated in Fig. 14, which demonstrates the fault blocking capability of the bipolar SMs.It can be seen that for MMC with half-bridge SMs, the fault current will flow through the paralleled diodes when the fault occurs and all the switches are blocked.For MMC with full-bridge SMs, the short-circuit current will flow through the capacitors, and the capacitor voltages can be used to compensate the AC-side voltage which limits the increase of the current.Thus, the DC fault can be blocked.In conclusion, it is a cost-effective and reliable method to use bipolar SMs to realize the DC-side fault ride-through.

Fig.14 Fault current path in MMC under
a DC pole-to-pole fault [61] 4

Reliability evaluation of modular multilevel cascade converters
The general and theoretical methodology to calculate the reliability of power converters has been provided in [62-66].The methodology should be modified slightly when applied to estimating the reliability of MMCC, considering the features of modularity and redundancy.

Failure rate and reliability calculation methods
The failure rate of an item is the mean number of failures per unit time, which is an indication of the "proneness to failure" of the item after time t has elapsed [63] .The failure rate function satisfies the well-known bathtub curve, shown in Fig. 15.The life cycle of an item can be divided into three periods: burn-in, useful life time and the wear-out period.
When carrying out reliability analysis, the useful life time with assumed constant failure rate is the focus.The relationship between the failure rate λ(t), the reliability function R(t) and the equivalent lifetime or the mean time to failure (MTTF) is shown by (1) Conventionally, for power converters, the failure rate of each component is calculated either through the empirical model such as RIAC-HDBK-217Plus [67] , or through the datasheets provided by the manufacturers [68][69] .The emerging physics-of-failure (PoF) methodologies have considered the components' physical failure mechanisms and seem to be more accurate [70] , however, the PoF methods involve detailed and tremendous assessment of fatigue and stresses requiring a significant investment.Thus, handbook or datasheet based methods will not be completely abandoned in the near future.Take the RIAC-HDBK-217Plus as an example, the failure rate of IGBT is given by Where λ 0 is the failure rate reference at T j =100°C, and the other variables are the factors of temperature, voltage, environments and quality respectively.Substitute the operating conditions and parameters into this model, the failure rate of IGBT can be obtained.Similarly, failure rates of other critical components mentioned in Section 3.3 can also be calculated.
Fig. 15 Bathtub-type curve depicting the three regions of the failure rate [62] Fig. 16 State space method based on the Markov-chain model To map the component level failure rate to the system-level reliability, three methods including reliability block diagram (RBD), fault-tree analysis (FTA) and state-space analysis (e.g.Markov-chain model) are widely used.Since modularity and redundancy are the major features of MMCC, stress changes are very common when the fault-tolerance takes place.Therefore, the Markov-chain model is very suitable and can effectively capture the reliability dynamics of numerous components under the varying stress conditions.The principle of Markov-chain analysis [71] is illustrated in Fig. 16.
For any given system, a Markov-chain model consists of a list of all possible reliability states of that system, the possible transition paths between those states and the failure rates of those transitions.Take State k as an example, there are four states which flow into or out of the state.The changing rate of the probability of State k satisfy the relationship given by where λ represents the failure rates and μ represents the repair rates.
Maintenance is not taken into consideration in this paper, thus the repair rate is neglected here.Each state in the model has an equation like (3), and these equations together determine the behavior of the overall system.Substitute the initial probability values and solve all these equations, the probability of all the system states can be obtained.Furthermore, the reliability or the system-level failure rate can be derived.

Reliability calculation of MMCC
The procedure to calculate the reliability of MMCC includes three steps: calculate the SM's reliability based on the Markov-chain model, calculate the clusterlevel and then the system-level reliability based on the k-out-of-n model.Take the design of half-bridge SM as an example (Fig. 17), where the capacitor bank is comprised of four capacitors connected in parallel in order to meet the capacitance requirement [72] .The assumptions for demonstrating the reliability calculation are given as follows: The four capacitors' original failure rate is λ cap0 , which is valid when the four capacitors share the total ripple current.When one capacitor fails with the open-circuit, the rate changes to λ cap1 .The remaining 3 capacitors can share the total current without exceeding their thermal limits, and the Markov-chain model [72] resultant increased voltage ripple is still within the tolerable limits for the IGBTs.However, once any 2 capacitors fail, the capacitor bank is considered as failed, as both current and voltage limits will be surpassed.For other critical components which don't have redundant backups, their failure rates are also shown in Fig. 17.In Fig. 17, the Markov-chain model of the SM contains 3 states: the original State 0, State 1 with one capacitor failed and State 2 with the whole SM failed.The failure rates of the transition paths between the 3 states are shown in [72]   01 cap0 (4) in which only the failure rates of the critical components are considered.It should be highlighted that IGBTs' failure rates in State 0 and State 1 are different due to the capacitor voltage ripple variation caused by the failed capacitor.From Fig. 17, the statespace equations with the initial probability values can be derived as follows:

P t P t P t P t t P t P t
where (0) 0 (0) 0 Solving these equations, the time-dependent probability of every state can be obtained.Then the whole SM's reliability can be derived as P 0 (t)+P 1 (t), and similarly the SM's failure rate is obtained.When the redundancy backups are set up for other critical components, such as series or paralleled connection of IGBTs, a similar procedure can be used to calculate the whole SM's reliability.This method for reliability calculation can be used to evaluate, compare and optimize the SM's designs for MMCC [58,72] .
The second step is to calculate the reliability of the clusters based on the SM's reliability.Whether the cold reserved SMs or the hot reserved SMs are used in the cluster has an effect.When a failed SM is replaced by a cold reserved SM [30] , the failure rates of the other SMs remain unchanged.Therefore, the cluster's reliability can be calculated through the k-out-of-n model in [73]    The above model represents the reliability of the case that k or more of the n SMs can keep the cluster operate normally.When the hot reserved SMs are adopted and the failed SM is bypassed, the capacitor voltage stress of other SMs will change [31][32] , which brings a little variation to their failure rates.Strictly speaking, the Markov-chain model should be used to calculate the cluster reliability.However, considering that there are so many failure states when the number of the SMs per cluster is very large, the k-out-of-n model in (6) or the Monte Carlo simulation can be used as the approximating method.Finally, the simplest RBD method can be used to extend the cluster reliability to the system-level reliability.

Lifetime estimation
Another index to evaluate the reliability of power converters is the lifetime.A case study on lifetime prediction of IGBT modules in a 2.3MW wind-power converter is provided in [70].The main steps include: survey on the field wind-speed profiles and the converter specifications, find the critical failure mechanisms and suitable life-time model of IGBTs, conduct electrical-thermal simulation to analyze the distribution of temperature profile, estimate the parameters of the lifetime models, and finally predict the lifetime.The detailed procedure can also be found in [74] for the general voltage source inverters.It seems possible to apply this idea to estimate the lifetime of the SMs of MMCC.Considering the redundancy setup, the system lifetime can also be derived.This lifetime prediction method involves the physical IGBT models and the electrical-thermal simulation, which makes it capable of estimating the lifetime based on the practical profiles.However, the method only considers several known failure modes of IGBTs and still relies on the lifetime models provided by the manufactures.Regardless, the method provides a new methodology to estimate the lifetime more reliably.

Possible measures to improve reliability of MMCC
Numerous methods have been proposed in literature for the reliability improvement of power converters [63] , including the active thermal management [75] , active fault management such as fault prognosis and diagnosis, and degraded operation under faulted situations.On account of MMCC, from the SM level, parameters' determination and component selection are very important to improve the reliability.From the converter level, the redundancy design, fault diagnosis and fault-tolerance are key points.Thus, the two aspects are focused on in the following.

Parameter selection considering reliability
The SM's reliability-oriented design method in [72] has been introduced in Section IV-B.It can be seen as an evaluation method for different design schemes based on the empirical reliability models.The concept of the systematic safe operating area (SSOA) for designing reliable power converters [76][77] can be borrowed for the SM design to improve the reliability compared to the well-known device's SOA, The SSOA is illustrated in Fig. 18.The area with the white background is the SOA for IGBT, with the collector-to-emitter voltage U CE as the horizontal axis and the collector current I C as the vertical axis.In contrast, the area with the light gray background is the SSOA for the converter, with the DC-side voltage u DC (t) and current i DC (t) as the axes.Furthermore, with the over-current and over-voltage limits, the area with the dark gray background is the actual operating area for the converter.
The SSOA aimed for the SM design is briefly introduced in the following.Firstly, the operating point of the SM is defined as [u DC (t), i DC (t)], the capacitor voltage and current, which can be measured directly or indirectly.They satisfy the relationships as follows, where i U (t), i V (t) and i W (t) represent the three phase arm currents respectively, and L DC represents the total stray inductance in the switching loop.The boundaries of the SSOA are determined by adding the effects of the non-ideal factors such as the stray inductances, the signal transmission delay and the temperature variation to the device's reverse bias Fig. 18 Systematic-SOA compared to device's SOA [76] SOA (RBSOA) and short-circuit SOA (SCSOA).The boundaries can be described by Where A RB and A SC are the coefficient matrixes and the right side matrixes represent the boundaries for the devices' RBSOA and SCSOA.
When designing SMs for MMCC, SSOA can be used to judge whether the parameters of the selected IGBT can satisfy the safe operating requirement.Moreover, the SSOA can be used to set the protection limits for the SMs in the operation phase.

Redundancy design, fault diagnosis and tolerant control
Redundancy design, fault diagnosis and faulttolerance are the unique and major measures for MMCC to improve the reliability, and have been researched extensively.For example, a complete fault handling process of MMC is provided in [29].The current research status and tendency are briefly introduced in the following:  Fault detection and location.The most fragile component in MMCC is the semiconductor device, which has two main failure modes: short-circuit and open-circuit failures.The short-circuit fault detection is often integrated in the gate drivers to response quickly and avoid the shoot-through [26] .
In contrast, the open-circuit detection can be implemented by software methods, such as those based on the Kalman filter [27] , the sliding mode observer [28] or the Luenberger observer [29] .At the same time, the faulty SM can be easily located by observing the deviation of the capacitor voltages.All the mentioned fault diagnosis methods are claimed to be effective and feasible.Thus, which one is better for practical application needs to tested and compared, considering the effectiveness, the resistivity against false alarms, the detection time, the implementation effort and the tuning effort [52] .Fault diagnosis which can quickly detect multiple simultaneous open-circuit faults still calls for more research [29] . Redundancy design and fault tolerant control.
Because of the modular structure, redundancy can be directly realized in MMCC.The cold reserved or the hot reserved SMs can both be utilized to guarantee the fault tolerance.Once a SM is failed, it can be replaced by a redundant SM immediately.The seamless transition is often preferred when the fault occurs and the faulttolerance is enabled [30] .How to setup reasonable redundancy backups, enabling a good trade-off between the reliability and cost, needs to be dealt with more in the future.

Reliability-oriented design procedure of MMCC
Based on all the above statements, a reliabilityoriented design procedure is proposed for MMCC, shown in Fig. 19.The procedure includes the following 8 steps:  Select the topologies, the SM circuits, the number of SMs per cluster, the SM's specifications, the control strategies and the modulation schemes according to the mission profiles and the converter specifications.Simulations can be done to verify the feasibility. Configure the preliminary schemes for the redundancy SMs of each cluster, including whether the hot or the cold reserved SMs are used and the quantity per cluster. Distribute the reliability requirement to every cluster and then to the single SM, based on the system reliability requirement and the previous two steps. Select all the critical components for the single SM, based on the SM's specifications.The IGBTs' selection needs to be verified by the SSOA theory. Calculate the single SM's reliability based on the Markov-chain model and test whether the reliability satisfy the SM's requirement.If not satisfied, the SM needs to be redesigned through adding redundancy components. Calculate the single cluster's reliability based on the k-out-of-n model and test whether the reliability satisfy the cluster's requirement.If not satisfied, the cluster needs to be redesigned through reconfiguring the schemes for the redundancy SMs. Assemble the clusters, calculate the system reliability based on the RBD method and test whether the reliability satisfy the system requirement.If not satisfied, restart the design procedure and try to use other topologies or different SM circuits. When the mission profiles and the reliability requirements are all satisfied, configure the fault diagnosis and tolerant control schemes, which enable fast response and seamless transition.

Conclusion
In this paper, an overview on the reliability of modular multilevel cascade converters is provided.Firstly, different topologies, different SM circuits and control and modulation schemes of the MMCC are briefly reviewed.Secondly, the failure cases of power converters are summarized and classified, and the main failure causes of MMCC are identified.Thirdly, the reliability evaluation methodology is applied to MMCC with consideration of redundancy.After that, some existing or promising measures to improve reliability from design phase to operation phase are presented.Based on the review, a reliability-oriented design procedure is proposed for MMCC.Finally, the challenges and the opportunities for improving the reliability of MMCC are addressed:

Challenges
 MMCC will play an important role in the industry and the impact of MMCC failure is great. MMCC has a large amount of components, all of which can be seen as potential failure sources. Many diagnostic and protection methods are proposed to deal with the catastrophic failures of the devices, but the wear-out failures due to the inappropriate use have been rarely focused on. Currently most calculations of component failure rates are still based on the simple handbook approach, without consideration of special failure mechanisms. Validation of reliability estimation and lifetime prediction calls for many resource-consuming tests and field data.

Opportunities
 Modular structure simplifies realization of faulttolerant control of MMCC though redundancy. Extensive on-line monitoring data from the field will enable easier identification of major failure causes. Emerging semiconductor and capacitor technologies enable design of more reliable SMs.
 Reliability-oriented design can increase the reliability, and thus reduce the system whole life cycle cost.

Fig. 2
Fig.2 Topology of the typical modular multilevel converter (MMC)