An Overview of Artificial Intelligence Applications for Power Electronics

This article gives an overview of the artificial intelligence (AI) applications for power electronic systems. The three distinctive life-cycle phases, design, control, and maintenance are correlated with one or more tasks to be addressed by AI, including optimization, classification, regression, and data structure exploration. The applications of four categories of AI are discussed, which are expert system, fuzzy logic, metaheuristic method, and machine learning. More than 500 publications have been reviewed to identify the common understandings, practical implementation challenges, and research opportunities in the application of AI for power electronics. This article is accompanied by an Excel file listing the relevant publications for statistical analytics.


I. INTRODUCTION
N OWADAYS artificial intelligence (AI) is expanding rapidly and is one of the most salient research areas during the past several decades [1], [2]. The aim of AI is to facilitate systems with intelligence that is capable of humanlike learning and reasoning. It possesses tremendous advantages and has been successfully applied in numerous industrial areas, including image classification, speech recognition, autonomous cars, computer vision, etc. With immense potentials, power electronics benefit from the development of AI. There are various applications, including design optimization of power module heatsink [3], intelligent controller for multicolor light-emitting diode (LED) [4], maximum power point tracking (MPPT) control for wind energy conversion systems [5], [6], anomaly detection for inverter [7], remaining useful life (RUL) prediction for supercapacitors [8], etc. By implementing AI, power electronic systems are embedded with capabilities of self-awareness and self-adaptability, and therefore, the system autonomy can be improved.
Manuscript received June 4, 2020; revised August 6, 2020; accepted September 11, 2020. Date of publication September 18, 2020; date of current version November 20, 2020. This work was supported in part by the Innovation Fund Denmark through the project of Advanced Power Electronic Technology and Tools, and in part by the Villum Foundation through the project of Light-AI for Cognitive Power Electronics. Recommended for publication by Associate Editor Prof. Kyo-Beum Lee. (Corresponding author: Huai Wang.) The authors are with the Department of Energy Technology, Aalborg University, 9220 Aalborg, Denmark (e-mail: szh@et.aau.dk; fbl@et.aau.dk; hwa@et.aau.dk).
This article has supplementary downloadable material available at https:// ieeexplore.ieee.org, provided by the authors.
Color versions of one or more of the figures in this article are available online at https://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TPEL.2020.3024914 Meanwhile, the rapid development of data science, including sensor technology, Internet-of-Things (IoT), edge computing, digital twin [9], and big data analytics [10], [11], provides a wide variety of data for power electronic systems throughout different phases of its life-cycle. The increasing volume of data enables immense opportunities and lays a solid foundation for the AI in power electronics. AI is able to exploit data to improve product competitiveness by global design optimization, intelligent control, system health status estimation, etc. As a result, the research in power electronics can be conducted from a data-driven perspective, which is beneficial especially to complex and challenging cases.
Due to the specific challenges and characteristics of power electronic systems, e.g., high tuning speed in control, high sensitivity in condition monitoring for aging detection, etc., the implementation of AI in power electronics has its own features that are different from other engineering areas, e.g., image classification. Therefore, there is a pressing need for an overview of AI in power electronics to expedite synergy research and interdisciplinary applications. Based on literature review, in this article, the applications of AI in power electronics are categorized into three aspects, i.e., design, control, and maintenance. Fig. 1 shows the annual number of publications related to AI for power electronics since 1990. The statistical data are based on searching the IEEE Xplore from the journals IEEE TRANSAC-TIONS ON POWER ELECTRONICS, IEEE JOURNAL OF EMERGING  AND SELECTED TOPICS IN POWER ELECTRONICS, IEEE TRANS-ACTIONS ON INDUSTRIAL ELECTRONICS, IEEE TRANSACTIONS  ON INDUSTRIAL INFORMATICS, and IEEE TRANSACTIONS  As a result, a total of 444 relevant journal papers are identified, which can be found in the supplemental Excel file. It can be seen that the implementations of AI have been drastically increased and experienced a spectacular dynamism over the last few years. The number of publications for control is continuously increasing and it is the most active research area. Since 2007, there is an increase regarding the design and maintenance applications, and such trends are more evident in the last two years.
It is found that several existing reviews in the literature are related to this topic. In [12], the metaheuristic methods for stochastic optimization for power quality and waveform, circuit design, and control tuning are reviewed. It focuses on the optimization tasks only. The details of neural network (NN) in industrial applications are presented in [13] with the design of network structure, training methods, and application considerations. It covers a broad scope of engineering applications beyond power electronics. In [14], a comprehensive review is given on the applications of NN in power electronics. Several specific examples of control and system identification are detailed. Nevertheless, other AI techniques, such as fuzzy logic, metaheuristic methods, etc., have not been discussed. Although these techniques are further discussed in [15], it emphasizes on illustrative examples while an in-depth analysis of AI algorithms is not provided. In [16], an intensive discussion of metaheuristic methods for MPPT in photovoltaic (PV) systems is presented. In [17], the AI techniques applied to PV systems are reviewed, which focus on the specific PV applications only.
Maintenance [18] in power electronics is a topic that includes reliability, condition monitoring, RUL prediction, etc. Several review papers in the past decade can be found in [19]- [22]. In [19], a state-of-the-art analysis of the condition monitoring and fault detection in power electronics is presented. However, it only includes a very limited AI-based fault detection method. In [20], a review of condition monitoring techniques for capacitors in power electronic converters is presented. It includes only the AI-based parameter identification methods. In [21], the methods in prognostics and health management (PHM) of information and electronics-rich systems are summarized. This article only discusses the category of AI algorithms in the PHM area while there is no algorithm detail or comparative analysis. In [22], machine learning methods applied in reliability management of energy systems are summarized. It focuses on the machine learning method and the maintenance task only. A tutorial [23] regarding "Artificial Intelligence Applications to Power Electronics" is presented on the 2019 IEEE Energy Conversion Congress and Exposition. It serves as an introductory level presentation. Nevertheless, the desirable details of the AI algorithms and their comparisons are not available.
As a result, it lacks a comprehensive review of the AI algorithms and applications for power electronics. From a life-cycle perspective, this article aims to fill this gap and comprehensively review the published research in power electronics using AI techniques, which needs a systematic consolidation. The contributions of this article include the following.
1) The AI algorithms in power electronics are systematically investigated from a life-cycle perspective, where the relationships of the relevant AI algorithms, their essential functions, and the relevant applications are identified.
2) A timeline map is provided to illustrate the milestones of AI algorithms and power electronic applications. Moreover, it presents the quantitative information of the method usage percentages and application trend.
3) The advantages and limitations of AI algorithms are comprehensively investigated. Exemplary applications are provided for AI in each life-cycle stage, where the challenges and future research directions are discussed. The rest of this article is organized as follows. Section II presents the functions, methods, and milestones of AI in power electronics. The applications of AI in design, control, and maintenance are discussed in Sections III-V, respectively. The outlook on the AI applications for power electronics is put forward in Section VI. Finally, Section VII concludes this article.
II. FUNCTIONS AND METHODS OF AI FOR POWER ELECTRONIC SYSTEMS Fig. 2 gives a summary of the methods, functions, and applications of AI for power electronics. It can be seen that AI has been extensively applied to the three distinctive life-cycle phases of power electronic systems, including design, control, and maintenance.
As a functional layer between AI and power electronic applications, the essential functions of AI are categorized as optimization, classification, regression, and data structure exploration. 1) Optimization: It refers to find an optimal solution maximizing or minimizing objective functions from a set of available alternatives given constraints, equalities, or inequalities that the solutions have to satisfy. For example, in the design task, optimization serves as a tool to explore an optimal set of parameters that maximize or minimize design goals with design constraints. 2) Classification: It deals with assigning input information or data with a label indicating one of the k discrete classes. Specifically, anomaly detection and fault diagnosis in maintenance is a typical classification task to determine fault labels with condition monitoring information. 3) Regression: By identifying the relationship between input variables and target variables, the goal of regression is to predict the value of one or more continuous target variables given input variables. For example, an intelligent controller can be facilitated with a regression model between the input electrical signals and the output control variables. 4) Data Structure Exploration: It consists of data clustering that discovers groups of similar data within a dataset, density estimation that determines the distribution of data within the input space, and data compression that projects high-dimensional data down to low-dimensional data for feature reduction. For example, in maintenance, the degradation state clustering is within the data structure exploration category. According to the surveyed 444 relevant journal papers, Fig. 3 shows a Sankey diagram of application usage statistics of AI methods in the life-cycle of power electronic systems. Specifically, the percentages of AI application in the design, control, and maintenance are 9.8%, 77.8%, and 12.4%, respectively.  Regarding the functions, the percentages of optimization, classification, regression, and data structure exploration are 33.3%, 6.6%, 58.4%, and 1.7%, respectively. It shows that most of the tasks of AI in power electronics are essentially regression and optimization. The AI methods can be generally categorized as expert system, fuzzy logic, metaheuristic methods, and machine learning. Their application percentages are 0.9%, 21.3%, 32.0%, and 45.8%, respectively. It suggests that the largest portion of AI in power electronics is with the machine learning. These methods will be detailed subsequently. Note that a comprehensive but still not exhaustive investigation is conducted. Only the relevant AI methods that are widely applied to power electronics are considered.

A. Expert System
Expert system is the earliest method in AI that is effectively implemented in industrial applications [17]. The expert system [24]- [27] is essentially a database that integrates the expert knowledge in a Boolean logic catalog, based on which the IF-THEN rules in human brain reasoning are simulated. It is an intelligent system simulating the inference process that answers the why-and-how inquires based on the database. The database is from either field expert experience or simulation data, facts, and statements. It can be continuously updated. The technical details of expert system are given in [17], and several exemplary applications can be found in [15] and [28].
It is worth mentioning that the applications of expert system are as low as 0.9% according to the usage statistics in Fig. 3. It is because the expert system is generally based on system principles and rules, which relates strongly to the system of interest and lacks universality. It applies to well-defined domains only with solid expert rules. Besides, due to the rapid development of computational platforms, the functions of expert system can be replaced with other advanced AI methods (e.g., fuzzy logic and machine learning) with superior capabilities in inference and approximation.

B. Fuzzy Logic
Similar to expert system, fuzzy logic is also a rule-based method while it extends the Boolean logic into a multivalued case. Fuzzy logic is an ideal tool to tackle system uncertainties and noisy measurements [29]- [31]. Instead of using the precise input crisp value directly, fuzzification is first performed with the fuzzy sets consisting of several membership functions to a range of 0-1. The fuzzy input signals are then aggregated with fuzzy rules in the inference step. Defuzzification is subsequently performed on the inference result by considering the degree of fulfillment and output a crisp value. As a result, the crisp value is manipulated in a fuzzy space that completes nonlinear mapping between the input and output with elaborately designed principles.
In most applications, a fuzzy logic method mainly consists of four parts [30]: fuzzification, rule inference, knowledge base, and defuzzification. First, fuzzification is performed on the input of linguistic variables with membership functions, including triangular, trapezoidal, Gaussian, bell-shaped, singleton, and other customized shapes. Second, the inference module integrates the signals together according to IF-THEN fuzzy rules in the knowledge base derived from expert experience. Third, defuzzification is performed on the signal for output. One example of the fuzzy rule is Antecedent: IF X is Medium AND Y is Zero, Consequent: Then Z is Positive. For both the antecedent and consequent, the degree of fulfillment is determined by the membership functions. The type of fuzzy inference scheme is categorized as Mamdani-type [30], [32]- [35] and Takagi-Sugeno-Kang-type (TSK-type) [31], [36]- [38]. For the Mamdani-type fuzzy inference scheme, the membership functions of the antecedent and the consequent are shape-based functions, e.g., triangular. For the TSK-type fuzzy inference scheme, the membership function of the antecedent part is identical to the Mamdani-type while that of the consequent is singleton at several constant values. Typically, more fuzzy sets are needed for the Mamdani-type scheme compared to the TSK-type scheme for the same task. Compared to the fuzzy terms in the Mamdani-type, the membership function in the TSK-type scheme can be functional type as either linear or constant, which is more powerful and accurate in nonlinear approximation. More theoretical details of fuzzy logic are discussed in [15], [39].
Note that expert experience plays a critical role in the design of the membership function and the fuzzy rule, and such a method is applicable to experts only in most cases. From this perspective, the prior information and expert experience can be coped with fuzzy logic and then incorporated with other AI techniques as a hybrid method.

C. Metaheuristic Methods
Once the optimization task of specific applications is formulated, the optimal solution can be obtained by either a deterministic programming method (e.g., linear or quadratic programming) or a nondeterministic programming method, i.e., metaheuristic method. The deterministic programming methods need to calculate the gradient and Hessian matrices [40], which is challenging for most of the optimization tasks in power electronics due to the complexity. Metaheuristic methods serve as a general end-to-end tool that needs less expert experience and is efficient and scalable for various optimization tasks.
The metaheuristic methods [12] are generally developed with inspirations of biological evolution, e.g., genetic algorithm (GA) [41] by process of natural selection, ant colony optimization (ACO) algorithm [42] by simulating ants in finding an efficient path for foods. The exploration of optimal solution is motivated by the trial-and-error process. The metaheuristic methods can be categorized as trajectory-based methods (tabu search method [43], simulated annealing method [44], etc.) and population-based methods [GA, particle swarm optimization (PSO) [45], ACO, differential evolution [46], immune algorithm (IA) [47], etc.]. For the trajectory-based methods, each exploration stage includes only one candidate solution and it evolves into another solution according to a certain rule. The performance of this method is mainly based on the quality and efficiency of the rule. As a result, the convergence speed of the trajectory-based methods is generally slow and the final solution is prone to local rather than global solution for nonconvex optimization tasks. For the population-based methods, multiple candidate solutions are randomly generated. At each iterative exploration, these candidate solutions are diversified (e.g., crossover in the GA) or incorporated and replaced with new candidate solutions to improve the quality of the population at the present generation. As a result, the suitability of the population is iteratively improved to approach the optimal solution. Compared to the trajectory-based methods, they are superior in the convergence speed, the global searching capability, and especially useful for large-scale optimization tasks. Nevertheless, the computational burden of the population-based methods is more intensive. This challenge needs to be considered for online application cases where efficiency and speed are of most significance. Table I shows a summary of the metaheuristic methods in the area of power electronics with their advantages and limitations. These metaheuristic methods are qualitatively compared in terms of several critical features, including implementation simplicity, global convergence, convergence speed, and parallel capability.
Due to enormous advantages, most of the optimization tasks in power electronics are solved with the population-based methods. It can be seen from Table I that there are various populationbased methods with the improved variants for optimization tasks in power electronics. They are developed and improved with different biological inspirations. In addition to the earlier widely applied metaheuristic methods, several other emerging approaches have been applied in a limited scale, e.g., biogeography-based optimization [72], crow search algorithm [73], grey wolf optimization [74], firefly optimization algorithm [16], bee algorithm [75], colonial competitive algorithm [76], teachinglearning-based optimization [77], etc. It is worth mentioning that the selection of the best method is not a simple task, which is application-dependent [12]. GA and PSO are the two most popular metaheuristic methods applied to power electronics, as shown in Fig. 4. They are the fundamentals and representatives for evolutionary algorithms and swarm intelligence algorithms, Note that there is no guarantee for a global optimum for metaheuristic methods, but the solution is generally satisfactory and acceptable for most practical applications. For more theoretical details of the metaheuristic methods, readers can refer to [16] and [78].

D. Machine Learning
Machine learning is designed to automatically discover principles and regularities with experience from either collected data or interactions by trial-and-error. For applications in power electronics, it is categorized as supervised learning, unsupervised learning, and reinforcement learning (RL).
1) Supervised Learning: With the training dataset consisting of input-and-output pairs, the supervised learning aims to establish the mapping and functional relationships between the inputs and outputs implicitly. This feature is especially useful for cases in power electronics where system models are challenging to formulate. Generally, the tasks of the supervised learning include classification and regression. For classification, its output of the input-and-output pairs in the training dataset deals with a finite number of discrete categories to be labeled. For example, the fault diagnosis for a multilevel inverter [94] is a typical classification task where the discrete fault label needs to be identified given the input fault information. For a regression task, the output of the input-and-output pairs consists of one or more continuous variables. An example of regression is the RUL prediction of IGBTs [114] where the output, i.e., the residual useful lifetime, is a continuous variable. Once the model is trained, it is ready to evaluate new data points that differ from the training dataset. The model capability in dealing with new data points, i.e., the ones in the testing dataset, is termed as the generalization. Since the training dataset comprises only a limited amount of possible input-and-output pairs in most cases, its generalization on new inputs is one of the most critical performance factors of supervised learning methods.
Generally, supervised learning methods can be categorized into connectionism-based methods (i.e., NN method), probabilistic graphical methods, and memory-based methods (i.e., kernel method). For NN methods, knowledge learned from the training dataset is facilitated and transferred as the connection weights and structures of the network. Numerous research has been devoted to improving the performance of NN methods. These improvements are from two aspects for applications in power electronics. The first aspect deals with enabling the uncertainty capability in handling the noisy signal of the NN to improve the method robustness. This feature is facilitated by integrating the fuzzy logic into the NN as the fuzzy NN (FNN) or its variants (e.g., adaptive neurofuzzy inference system (ANFIS) [101]). The second aspect is for dynamic-performance improvement of the NN to tackle time-series dataset cases, e.g., intelligent controller, RUL prediction. Compared to the conventional NN where the network weights are independent, the transient performance is facilitated by sharing weights between different layers and network cells. The weight sharing can be implemented either in a shallow scale with a convolutional structure (e.g., 1-D convolutional NN (CNN), time-delayed NN (TDNN) [114]), or in full and deep scale by using a recurrent unit as recurrent NN [105]. Generally, the modeling capability of recurrent unit implementation is superior to the one with a convolutional structure. More theoretical details of the NN methods are discussed in [1,Ch. 5], [13], and [14].
The probabilistic graphical methods obtain knowledge from the data by using a diagrammatic representation of input-andoutput pairs. The diagrammatic representation implies the conditional dependence relationship between the decision variables. The underlying relationship in the model is formulated in the Bayesian framework [1] and can be inferred in a probabilistic way. Thus, the interpretability of the model is much better compared to NN methods. Besides, the probabilistic graphical model is superior in dealing with uncertainty and incomplete knowledge. One of the typical probabilistic graphical methods is the Bayesian network [117]. More theoretical details of the probabilistic graphical methods are given in [1,Ch. 8].
For the NN methods and the graphical methods, the training dataset is discarded when the training is completed. While the training dataset in kernel methods is kept and used in the testing stage, and the learned knowledge is facilitated as the identification of critical data points (e.g., support vectors in support vector machine (SVM) [126]) or subset in the training dataset. One typical kernel method is Gaussian processes, which has been applied to the RUL prediction of IGBTs in [119]. Note that the conventional kernel methods (e.g., Gaussian processes) are computationally intensive due to the whole training dataset is applied to the testing stage. To avoid the excessive computational burden, sparse solutions are proposed as SVM and relevance vector machine (RVM), where the parameter estimation is improved based on Bayesian methods. With the sparse solution, only a subset of the training dataset is applied to the testing stage, and thus, it is more efficient compared to the conventional kernel methods. More theoretical details of the kernel methods are discussed in [1,Chs. 6 and 7]. Generally, the requirement of the training dataset for the kernel methods is lower than the NN methods. Therefore, the kernel methods are more suitable for the cases with a small dataset. While due to the training dataset is needed in the testing stage, the memory requirement of the kernel methods is higher than the NN methods. The involvement of the training dataset also limits the speed performance at the testing stage. It should be considered for online applications where the execution time is critical, e.g., control application.
As a result, Table II shows a summary of the supervised learning methods and their variants in power electronics, in terms of the advantages, limitations, and exemplary applications.
2) Unsupervised Learning: Compared to the supervised learning where the dataset is input-and-output pairs, unsupervised learning has no output data for the learning target during the learning process. Generally, the tasks of unsupervised learning in applications of power electronics can be categorized as data clustering and data compression.
For the data clustering, it explores the regularities from the smeared dataset and partitions the dataset into several different groups or clusters according to their similarities. In this way, the data characteristics within the same cluster are similar to each other and different from the ones in other clusters. One typical data clustering application is the identification of the discrete health state from the continuous degradation data [131] in the condition monitoring of power electronic converters. The purpose of the data compression is to eliminate excessive information in the dataset to reduce the number of features of the dataset. For example, using principal component analysis (PCA) [127], a reduced representation of the dataset is obtained with a much fewer number of features, which yet maintain the integrity of the dataset.
Generally, these unsupervised learning algorithms serve as the data-preprocessing before it goes to the subsequent analytics (e.g., fault diagnosis). Although this step is optional, it is beneficial to reducing the computational burden and improving the analytics accuracy. Table III gives a summary of typical unsupervised learning methods for power electronic applications. More unsupervised learning methods and theoretical details can be found in [137].
3) Reinforcement Learning: In contrast to the supervised learning and the unsupervised learning, RL does not require a training dataset. Instead, it aims to find a suitable action strategy that maximizing the reward for a specific task, which is essentially a dynamic programming or optimization task. This goal-oriented strategy is formulated from interactions with systems or simulation models by a trial-and-error process [138]. In this way, it accumulates experience progressively and learns a specific strategy that maximizes the predefined goal. Theoretically, RL is a Markov decision process [139]. The training of RL aims to develop a Q-table in terms of an action selection policy, which can maximize the total expected rewards over the future. The Q-table is an informative policy matrix that records the optimal action to be taken given the particular condition variables. More theoretical details of RL can be found in [139]. One application example is the MPPT [5], [6], [140]. Note that RL obtains the experience from the interactions between systems instead of existing datasets. It is, thus, more favorable for the cases where the system is with less knowledge or its model is challenging to formulate.
As a summary, Fig. 5 presents the usage statistics of the machine learning methods. Supervised learning is dominantly applied to power electronics. The reason is that the supervised learning is a versatile tool, which is typically the central part of the majority of machine learning-related applications in power electronic systems.

E. Timeline of Relevant AI Methods and Applications in Power Electronics
Fig . 6 summarizes the milestones of the relevant AI methods and their applications in power electronics. It includes the year when the algorithm is first proposed, the first application in power electronics, the milestones of relevant AI algorithms, and applications in terms of each method. It should be noted that the information is to the best knowledge of the authors. Also, the timeline is not extensive to include all of the existing AI algorithms. Instead, only the ones that show great potentials in power electronics are included. According to Fig. 6, following can be noted. 1) The application of both expert system and fuzzy logic is moderate nowadays, especially for the expert system. Before the 2000s, their practical implementations are developed in the presence of the limited performance of computing hardware, which has been significantly improved to date. This rapid development of computing hardware facilitates and accelerates the implementation of other more powerful AI methods for replacing expert system and fuzzy logic. 2) Metaheuristic methods are continuously evolving and applied to power electronics. They are used for a complete task or a key step jointly with other machine learning methods. 3) NN methods are the most active area for AI applications for power electronics. The reason is twofold. First, the significant development of computing hardware unleashes the potentials of NN methods in dealing with complex tasks in power electronic systems. Second, the structure of NN is quite flexible to incorporate other AI methods for performance improvement, implying numerous method variants. 4) There is an increasing trend of applications with kernel methods and probabilistic graphical models. It is because most of these methods are formulated within the Bayesian framework, which possesses better generalization and interpretability. Moreover, their computational burden can be well tackled with the platforms to date. 5) RL is the latest frontier of the machine learning methods applied to power electronics, facilitated by the rapid development of computing hardware. The following can be noted from Figs. 2, 3, and 6 about the comparisons for different AI methods.

1) Both metaheuristic methods and machine learning can
be applied to optimization tasks. Specifically, machinelearning-based optimization (i.e., RL) focuses on the dynamic optimization involved with the decision-making (e.g., MPPT). Metaheuristic method is generally applied to the static optimization (e.g., heatsink design). 2) Both fuzzy logic and machine learning can be exploited for classification tasks. Generally, machine learning is more accurate and flexible than fuzzy logic. 3) The regression task can be implemented with expert system, fuzzy logic, and machine learning. The implementation of expert system is simple but less powerful compared to fuzzy logic and machine learning. The implementation of fuzzy logic needs expert experience. Machine learning is the most popular method and various algorithm variants have been developed. It can be incorporated with fuzzy logic for performance improvement. 4) Only machine learning can be applied to the task of data structure exploration. The following three sections discuss the applications of the previously introduced AI methods in the design, control, and maintenance phases of power electronic systems, respectively.

III. DESIGN
Design in power electronics encompassing topology selection, component sizing, circuit synthesis, reliability considerations, etc., is essentially an optimization task [145]. A typical procedure for the design of power electronic systems comprises following four steps.
1) Objective formulation: Objective functions are desirable design goals to be maximized or minimized. Generally, the design goals in power electronics include component parameter [41], weight [146], volume [147], cost [146], heatsink pattern [3], area [148], power loss [62], etc. It is crucial for formulating the required or desired design requirements to several explicit mathematical expressions as a single objective, as given in (1), or multiple objectives, as given in (2) [12], [145]: where g(x) and h(x) are inequalities and equalities, respectively. x l and x u are the lower and the upper boundaries for decision variables x, respectively. Here, the maximization is the goal, which can simply be applied to the minimization case. Note that, for multiple objectives in (2), it can be either solved by maximizing a scalar function w T f (x) by weighting multiple objectives together or by optimizing objective vector f (x) directly, where Pareto front [62] can be applied to determine the optimal solution, e.g., the nondominated sorting GA method for multiobjective design optimization of power modules in [60].
2) Constraint space: The constraint space defines feasible space, boundary, relationship, and limitation that the objective function is subjected to. These constraints include either linear or nonlinear equalities and inequalities. They are derived from the practical design requirements, e.g., geometry, volume, lifetime characteristics, cost, etc. 3) Solution exploration: The defined optimization problem is to maximize (or minimize) objective functions by adjusting the decision variables in the constraint spaces. AI methods, especially the metaheuristic methods, can be applied to this step. 4) Performance evaluation: The candidate solution can be tested against the predefined objectives by using simulation, hardwire-in-the-loop testing, prototype experiment, etc. The results can be returned to previous steps for further performance improvement and optimization. Instead of a sequential procedure, the design task is an iterative trial-and-error process. Based on the evaluation at each step, the task may be reformulated, e.g., adjusting the objectives, modifying the constraint space, reconfiguring the programming methods, etc. For conventional design in power electronics, it is time-consuming and needs multiple iterative steps. For example, the component alignment and the model selection rely on expert experience and intuition without ample quantitative reference. In this way, the design performance will converge slowly to the required standards. This drawback can be mitigated by AI methods. They can be applied to 1) objective formulation for the design time reduction, and 3) solution exploration for the modeling and optimization.

A. Design Time Reduction
The formulation of design objective needs to be improved if its evaluation is computationally intensive. One application of AI methods is a surrogate model in the objective formulation to reduce the computational effort. The surrogate model yields an identical behavior to the system dynamics that are challenging to formulate or need intensive computational efforts to characterize. In the iterative design process, AI-based surrogate model serves as a replacement that significantly reduces the computational effort.
As an application of Design for Reliability (DfR), in [80], two feed-forward NNs (FFNNs) are applied to the automated reliability design of power electronic systems. The first FFNN serves as a surrogate model emulating thermal characteristics of power converters, by which the design parameters can be mapped to the information of junction temperature variations. Subsequently, the second FFNN is applied to map the annual mission profiles (e.g., annual solar irradiation and ambient temperature) to the annual lifetime consumption. In this way, the nonlinear relationship between the designed parameters and the annual lifetime consumption is quantitatively characterized, which can accelerate the iterative design process.
Another example of AI for DfR of power electronic systems is given in [109]. With superior capability in tackling time-series data, a nonlinear autoregressive network with exogenous inputs (NARX) is applied to the thermal modeling of power electronic Fig. 7. Nine different cell patterns for each blank cell [3]. A GA is applied to determine the optimal combination of different cell patterns for the heatsink design for minimizing the junction temperature.
systems considering the thermal cross-coupling effects. The proposed NARX-based thermal model can be completed within around 109 s, which is a significant efficiency improvement compared to the 1005 s of the conventional model. The error between the temperature estimated by the NARX-based thermal model and the actual measurement is less than 1 • C. Experimental results indicate that the NARX-based thermal model can replace the conventional model with less testing efforts and much less computational burden.
In [79], considering the electrothermal interactions, an FFNN is applied to construct the component behavior model of MOSFETs without any in-depth knowledge of the device structure. Under the static state, the complicated nonlinear and temperature-dependent characteristics between the variables, including drain-to-source voltage V DS , gate-to-source voltage V GS , junction temperature T j , and the output current I D are established by using the NN. This compact model can drastically accelerate the design simulation process with a comparable accuracy.

B. Modeling and Optimization
The modeling and optimization of power electronic systems is about specifying circuit topology, component model, component parameter, etc., such that system dimension, weight, operating frequency, etc., will result as optimal characteristics (e.g., power loss, power density) given design constraints [12]. Specifically, the optimization method is applied to the solution exploration to provide an overall optimal configuration, where metaheuristic methods in AI can be exploited. As mentioned, the selection of a suitable metaheuristic method depends on the specific application. Several exemplary applications are given as follows.
In [3], GA is combined with finite-element analysis for the automated heatsink design of a 50-kW three-phase inverter. As shown in Fig. 7, GA is applied to optimize the combination of nine customized patterns to formulate a complex cell pattern of heatsink. The goal is to minimize the junction temperature of power semiconductor devices. Compared to the conventional design with a regular cell pattern, the proposed method formulates a heatsink solution with 27% less in size and 6% lower in junction temperature.
In [62], the design of a 500-kW solar power-based microgrid system is formulated as a multiobjective optimization task, which maximizes the average power distribution and minimizes the system weight simultaneously. It explores the optimal values of four microgrid parameters, including battery voltage, PV maximum power, PV maximum power point voltage, and number of panels per string. The GA combining with the Pareto front is applied to solve the multiobjective optimization task. Besides, there is a specifically improved variant of GA for the multiobjective optimization task, i.e., nondominated sorting GA II (NSGA-II) [63].
In [45], the PSO is applied to the circuit synthesis of a power electronic circuit, where the optimal values of components are explored to fulfill the design goals of better static and dynamic performance. For this specific case, the simulation indicates that the PSO yields a superior solution with less computational effort compared to GA.
In [70], the ACO is applied to determine the optimal component values in a power electronic circuit, where the conventional ACO is extended to facilitate the optimization with continuous component values and accelerate the optimization process. Moreover, the component tolerance is incorporated into the optimization, which makes the proposed method more beneficial to practical applications.

IV. CONTROL
Essentially, control applications with AI methods in power electronic systems can be categorized as the optimization and the regression. Similar to the optimization in the design phase, the optimization-related tasks in control applications are also dealing with metaheuristic methods. Several representative applications are given ahead.
In [64], a GA is applied to the PID tuning of a programming logic controller, where the optimization goal is to minimize the error between the ideal step and ramp responses and the ones initialized with proportional term K p , integral term K I , and derivative term K D found by GA. Experimental analysis indicates that the output performance of the optimized controller is very close to the ideal step and ramp responses.
In [42], to overcome the challenges of multiple maximum power points in partially shaded situations for PV systems, an ACO-based MPPT method is proposed. It is compared with conventional methods, including constant voltage tracking, perturb & observe, and PSO. The experimental results indicate that the ACO-based MPPT method is superior in global convergence and robustness to various shading patterns.
In [47], in a single-phase full-bridge inverter, an IA is applied to find the optimal sinusoidal pulsewidth modulation (PWM) control sequences of four switches minimizing the total harmonic distortion (THD) of the output waveforms. The experiment indicates that the THD by using IA is 0.79%, which is superior to that of the conventional control method of hysteresis current PWM with 1.23% and the GA solution with 0.99%. Moreover, the IA is superior to the GA in convergence speed. More examples of optimization-related control applications can be found in [12].
The regression-related tasks in control applications are dealing with the nonlinear mapping of system inputs and outputs in a static or dynamic way. Specifically, it is concerned with regulating systems to ensure intended performance output with system principles. Several limitations of conventional methods are identified, which are as follows. 1) The controller configuration requires in-depth knowledge of system control principles, which are challenging and even infeasible for complex cases. It is time-consuming for complex systems to consider the time-varying and piecewise-linear characteristics, where the controller is generally optimized at several critical operational points rather than the full operational area, resulting in a suboptimal solution. 2) Once the controller is installed, it operates in a static way with limited adaptability, suggesting that it is only applicable to time-invariant systems. Nevertheless, when environmental and operational conditions change, the controller will be less robust to system parameter shifts and the control performance is likely to deteriorate. 3) From the efficient control perspective, an ideal controller must be able to cope with parameter tolerances with a fast transient response to maintain system stability. However, such a desired feature cannot be well fulfilled. These limitations can be mitigated with AI methods. For the regression-related task in control applications, it is organized in terms of fuzzy logic, NN, and RL.

A. Fuzzy Logic-Based Controller
Fuzzy logic-based methods have been widely applied to the control of power electronic systems, e.g., speed control [30], MPPT [35], energy management [149], to name a few.
In [30], a control strategy with three fuzzy logic controllers is developed for a variable speed wind generation system. The structure of the generator speed programming controller is given in Fig. 8. The control variables include the increment of the output power ΔP o and the last variation of speed LΔw * r . The controller outputs the variation of speed Δw * r to adjust the generator speed for a maximum wind power output. The Mamdani-type fuzzy logic is applied and the information is aggregated according to the rule matrix table, e.g., "IF ΔP o is PS AND LΔw * r is ZE, THEN Δw * r is PM." The membership functions are iteratively tuned by the system simulation and experiment. Similar Mamdani-type fuzzy logic controller for the primary frequency regulation of a wind farm can be found in [34].
In [36], a fuzzy logic controller is proposed for regulating the speed of a switched reluctance motor based on TSK fuzzy logic by approximating an ideal control law. The parameter is tuned by using the Lyapunov stability theorem to ensure system stability. The experimental analysis demonstrates that the developed adaptive TSK-type controller outperforms the conventional fuzzy logic controllers and the PI controller. A similar TSK-type controller can be found in [31] for approximating the typical sliding-mode control curve for integrated LED drivers. It is computationally efficient and implemented on a low-cost platform.
Although the fuzzy logic controller can handle the system uncertainty, similar to conventional methods such as PID, there is no internal updating mechanism, and thus, the adaptability is limited [50]. Also, it can be seen that the design of membership functions and fuzzy rules require expert experience, which highly limits the method practicality. Thus, such a method is applicable to experts only in most cases. Nevertheless, from this perspective, the expert experience can be coped with fuzzy logic and, then, incorporated with other AI techniques as a hybrid method, as discussed later.

B. NN-Based Controller
As a black-box technique, NN can approximate a wide range of nonlinear functions to arbitrary accuracy. With few requirements on system knowledge, the NN-based controller possesses several advantages, such as robustness, model-free, dynamic, adaptive, universal approximation, etc.
1) Conventional NN: The most widely used NN in power electronics is the FFNN (or backpropagation NN) with a feedforward multilayer and a backpropagation topology [14]. The respective applications essentially exploit the property of static nonlinear mapping of the FFNN.
In [82], an FFNN is applied to the waveform processing and delayless filtering. With two cases of variable frequency and variable magnitude, it indicates that the FFNN can convert m-phase waveform with an arbitrary shape into the n-phase waveform with various characteristics of magnitude and frequency. The FFNN-based waveform processing method provides a simplification of the hardware implementation. Moreover, additional single processing functions can be embedded easily due to the structure flexibility.
In [83], the space vector PWM (SVPWM) for a three-level voltage-fed inverter is implemented with an FFNN. The input of the NN is the sampled command phase voltages and the output is the pulsewidth patterns of SVPWM. The training Fig. 9. Structure of an RBFN with three layers [50]. x 1 i is the input of the input layer node i and y 1 i is its output. y 2 j is the output of the hidden layer node j. y 3 k is the output of the output layer node k. The input layer and the hidden layer are fully and directly connected with no weights. data are generated by the simulation with an SVPWM algorithm. By comparing with a conventional digital signal processor (DSP)-based SVPWM solution, the performance of the FFNNbased SVPWM is verified and it can be flexibly implemented on a dedicated IC chip.
In addition to FFNN, another conventional NN structure is radial basis function network (RBFN). In FFNN, the weights of input-to-hidden and hidden-to-output are simultaneously determined. For RBFN, the input layer is directly and fully connected to the hidden layer without weights. The hidden layer is connected to the output layer by weights W j , which are the only weight parameters to be determined in the training, as shown in Fig. 9. Typically, the generalization of RBFN is better than FFNN and the training speed and the execution speed are faster. An exemplary application of RBFN in a three-phase induction generator to regulate the dc-link voltage and the ac line voltage can be found in [50].
Regarding the number of neurons, there are few principles to determine the optimal number. A generic method is to start with a relatively small number of neurons and then gradually increase it according to the training error. For the activation function in the hidden layer, there are various options, including sigmoid [4], [51], [52], [83], radial basis function [50], [150], hyperbolic tangent function [106], [151], wavelet [46], [53], [84], [152], etc. It is worth mentioning that the wavelet activation function possesses the superior capabilities of convergence speed and generalization.
2) NN With Fuzzy Logic: In control applications, parameter uncertainty and external disturbance should be well considered for system stability and robustness. As a result, an improved variant of NN, i.e., FNN, or neurofuzzy system, which is a hybridization of NN and fuzzy logic, is proposed. FNN has the merits from both aspects [100], i.e., the humanlike IF-THEN reasoning rules of fuzzy logic that incorporates expert knowledge and cognitive uncertainty, and the strong capabilities of approximation and generalization to any nonlinear systems by the NN. More theoretical details of FNN can be found in [39]. In [100], an FNN is applied to simulate the sliding-mode control of a boost converter to alleviate the chattering phenomena. The block diagram of the controller is given in Fig. 10(a) and the FNN structure with four-layer is given in Fig. 10(b). The inputs of the FNN include the sliding surface S(t) and its differentiatioṅ S(t), which are obtained based on tracking the errors of the average output voltage e v and inductor current e i , given the reference voltage V ref and current i ref . The output control signal is the duty cycle u of PWM. The fuzzy inference is implemented by the rule layer as l k = n i=1 w k ji μ j i (x i ). The network output is obtained as u = f ( N y k=1 w k l k ). For the voltage control, the voltage tracking performance is evaluated by the mean-square error (MSE) of the output voltage Fig. 11. ANFIS-based controller for a PWM-inverter-fed induction motor drive [101]. It is a five-layer network structure with the capability of automatic identification of fuzzy rules.
where T is the number of sampling instants. The network tuning aims to reduce the MSE as much as possible to output an accurate and stable voltage. The performance of the FNN can be significantly improved if the membership function is well designed. For example, in [46], an asymmetric membership function is applied to the controller of a six-phase permanent magnet synchronous motor. It indicates that the learning speed can be improved and the network structure can be simplified compared to conventional membership functions, e.g., Gaussian function [71], [99], [100].
One of the challenges of FNN is the design of the fuzzy rule, where extensive expert experience is usually needed [100]. To overcome this challenge, another typical and effective framework incorporating fuzzy logic and NN is an ANFIS, which can be extended from the four-layer structure in Fig. 10 as a five-layer topology [101], as shown in Fig. 11. In the ANFIS, the IF-THEN fuzzy rules, which require the involvement of experts, can be generated automatically in the training. For example, in [101], a direct-torque neurofuzzy control scheme is developed for a PWM-inverter-fed induction motor drive based on an ANFIS. As shown in Fig. 11, the inputs of the ANFIS-based controller include the flux error ε m and the torque error ε Ψ . Layer 1 is the membership layer with the input weights w m and w Ψ . Layer 2 chooses the minimum from the inputs. Normalization is performed in layer 3. In layer 4, the outputs o i is linearly combined with the network inputs u d = (ε m , ε Ψ ). Layer 5 is the network outputs of the stator voltage command vectors in polar coordinates V c and ϕ V c . Δγ i is the increment angle and γ s is the actual angle of the stator flux vector. In contrast to the conventional training schemes, the parameter tuning of the ANFIS is completed interactively with the backpropagation algorithms (for membership functions) and the least square method (for parameters in fourth layer). More theoretical details of the training methods of the ANFIS can be found in [153].

3) NN With Recurrent Units:
The NN structures in Section IV-B1 and FNN in Section IV-B2, however, are only applicable to the static relationship mapping and behavior characterization. The dynamic performance of the controller is critical for the transient response. To enable the dynamic capability of the NN controller, a memory unit of time-delayed feedback connection Z −1 is usually inserted to formulate recurrent NN (RNN) [107], as shown in Fig. 12. The outputs of the network not only depend on the present inputs but also on the previous ones. As a result, the network structure can tackle the time series data to facilitate the better performance of dynamics and sensitivity.
In [106], a robust controller based on RNN is proposed for single-phase grid-connected converters for better control performance in the presence of system parameter changes. The training of the RNN is completed by the Levenberg-Marquardt (LM) method [13], [82], [106]. The harmonics can be significantly reduced by using the proposed RNN-based controller, and the requirements of the high sampling and switching frequency and the damping policies for the conventional control methods can be mitigated. A similar RNN structure, which is also termed as Elman NN (ENN), can be found in [52].
In addition to the performance of dynamics, fuzzy logic is also incorporated into RNN in order to improve the performance of robustness. For example, in [99], a controller based on a TSK-type self-organizing recurrent FNN (RFNN) is proposed for a high-precision trajectory tracking control of a linear microstepping motor driver. The network structure is given in Fig. 12. The TSK-type self-organizing RFNN is applied to model the inverse dynamics of the driver. Compared to the FNN in Fig. 10(b), the key of the RFNN is the insertion of a recurrent layer, where the delayed neuron output h i (k) is returned as the neuron input to facilitate the network dynamics. The network diagram and size are adjusted by the self-organizing method, and the respective network parameters are tuned with the method of recursive least square. As a result, the network diagram and its parameters can be optimized simultaneously.
The backpropagation algorithm is based on the idea of steepest gradient descent. One of the key steps in the backpropagation algorithm is the iteration of the weight update where w k is the current weight, g k is the current gradient, η k is the learning rate, and w k+1 is the weight of the next iteration.
To calculate the gradient g k and find the steepest direction of gradient descent efficiently, various improved variants of the backpropagation algorithm have been proposed, e.g., LM method [13], [82], [106], resilient backpropagation algorithm, conjugate gradient algorithm, one-step secant algorithm, etc. Note that it is challenging to determine the most suitable training algorithm for a specific task. It depends on multiple factors, including problem complexity, dataset size, number of parameters, task types of classification or regression, etc. A useful reference can be found in MATLAB Manual of Neural Network Toolbox [40], where the theoretical details, advantages, limitations, and comparisons of these training algorithms are thoroughly analyzed with several benchmark examples. It is worth mentioning that the LM method is one of the most widely used methods for the applications in power electronics with a fast convergence speed and a high accuracy. Considering whether the training dataset is available in a batch form or in a sequential form, the training scheme of the NN can be completed in either batch learning, which is also termed as offline learning, or sequential learning, which is also termed as online learning or incremental learning.
For batch learning, the gradient g k in (4) is calculated based on all the data points in the dataset for the parameter updates. It generally applies to the case where the whole dataset is available before the NN is implemented for field application, e.g., the waveform processing and delayless filtering in [82].
For sequential learning, the gradient g k in (4) is calculated based on every newly available data point or several newly available data points forming a minibatch. Therefore, the learning process is incrementally completed. This feature is especially useful for the case where the training data can only be sequentially obtained in field applications. The intelligent controller [53] is a typical case of a sequential training scheme since the input data of the NN can only be available sequentially by interacting with the output of the control command and the system. With this adaptive capability, the NN can be reparameterized and reconfigured for tracking the system parameter shifts. One Fig. 13. Framework of RL in the MPPT controller of wind energy conversion systems [5], [138]. A Q-table is formulated to save the optimal generator rotor speed w * r to be performed given the current system state s t , including the current electrical output power P e and the generator rotor speed w r . of the key steps for the sequential learning is determining a suitable learning rate η k in (4), since a larger η k will result in system instability and a smaller η k will lead to slow convergence. The optimal learning rate η k can be determined by using the metaheuristic methods in the training, e.g., PSO in [50], [52], and [53] and differential evolutionary (DE) in [46]. As a result, the sequential learning process can be stable and converges fast.

C. RL-Based Controller
With RL, the controller learns a goal-oriented control strategy by interacting with the physical system or its simulation model [138]. It accumulates experience progressively and learns a specific control strategy that maximizes predefined goals.
One of the relevant applications of RL-based controller is the MPPT in renewable energy systems [5], as shown in Fig. 13. Specifically, a real-time intelligent MPPT algorithm using RL is proposed for a wind energy conversion system. With the online learning capability of RL by interacting with the environment, an optimum control strategy is formulated in the Q-table. The Q-table consists of elements of state transition probability q(s t , a t ), which can facilitate the maximized power output (or reward) if action a t , i.e., the expected generator rotor speed w * r , is performed given the current system state s t , including the current electrical output power P e and the generator rotor speed w r . As a highlight, the wind turbine parameter and the wind speed are not required. This article is further extended by integrating an NN into the Q-learning of RL [6]. In this way, the challenges in the determination of the state space are avoided. The online learning process can be reactivated once the learned optimal relationship is destructed by the system aging behaviors. It significantly improves the autonomous capability of the wind energy conversion system. A similar example can be found in [140], where RL is applied to the MPPT control of a buck converter of PV arrays.
For the NN-based controller, the learning process is completed from examples provided by an external supervisor. While the RL controller can learn the experience by directly interacting with the environment through actions and rewards. It is worth mentioning that the training of the RL controller is based on the interactions between the controller and the system, and the  offline dataset is unnecessary in this case. As a result, the RLbased controller is beneficial to new systems without existing datasets.

D. Discussions
A summary of the advantages and limitations of AI algorithms in control applications is given in Table IV. It is worth mentioning that the dynamic performance, robustness, generalization, and convergence speed of AI algorithms are critical in control applications. The algorithm complexity and computational burden are the major challenges. Thus, high-performance DSP or field programmable gate array is necessary for practical implementations.

V. MAINTENANCE
Although reliability characteristics have been elaborately considered in design and control, power electronic systems still undertake various risks and even catastrophic failures due to complex and severe working environments [18], [154], [155]. The reliability and safety of power electronic components, converters, and systems are of great importance for field applications. In maintenance, preventive activities, including condition monitoring, anomaly detection, fault diagnosis, RUL prediction, etc., are effective approaches to ensure that intended functions can be properly executed. These activities are aligned with the IEEE standard framework of PHM for electronic systems [156]. Fig. 14 presents a systematic flowchart of maintenance activities in power electronic systems. Generally, it consists of following three parts.
1) Offline Training And Knowledge Learning: It integrates various aspects of knowledge including historical monitoring data, simulation data, accelerated aging test experiment, failure mode and effects analysis, etc. Moreover, ensemble methods or fusion techniques are typically applied to this part for performance improvement. As a result, physical system dynamics and behaviors (e.g., degradation behavior) can be accurately characterized as offline models based on the information of the unit population. 2) Condition Monitoring and Health Assessment: This part deals with the health assessment of the unit in service subjected to the online condition monitoring in field applications. The offline model is tailored and individualized to the unit in service through the model parameter tuning layer by adapting to field operational environment and workload. The functions include the noninvasive parameter identification, data preprocessing (e.g., data cleaning), feature mining, anomaly detection, fault diagnosis, and RUL prediction. In this way, insightful knowledge for decision-making can be extracted from the continuous condition monitoring information. 3) Management and Decision-Making: In this part, the supportive knowledge of health assessment is returned for optimal decision making. With this feedback, control policies (e.g., power routing) can be adjusted to maximize the system performance given the real-time health status. Moreover, economical maintenance policy can be made to facilitate the condition-based and predictive maintenance. Subsequently, the relevant applications of AI in maintenance in terms of these three parts are discussed in detail.

A. Condition Monitoring
Condition monitoring [20], [157], [158] in power electronics includes system parameter identification, data preprocessing, and feature mining. The condition monitoring information is applied to uncover hidden and informative insights, which serve as a basis for the subsequent PHM applications.
1) System Parameter Identification: The system parameter identification [159] deals with information acquisition for critical components. Developing specific hardware for parameter identification (e.g., temperature-sensitive electrical parameters of IGBTs [158]), however, is quite a challenging task due to features of power electronic systems, e.g., very tight space in a power module, very fast switching frequency, relatively insignificant parameter changes in terms of aging [157], etc. One of the promising solutions is noninvasive method without any extra hardware implementation, where information of interest can be inferred or estimated indirectly from available physical signals. As a result, the condition monitoring can be implemented with a sensorless and cost-efficient solution, which is favorable for industrial practitioners. Generally, the system parameter identification can be categorized into model-free and model-based methods, considering whether the system dynamics and models are required.
For the model-free method, no prior knowledge of the system dynamics is required. Essentially, it deals with the regression capability of AI algorithms to construct a relationship between the inputs and outputs. For example, in a three-phase front-end diode bridge motor drive, the current i a,out in a-phase and the dc-link ripple voltage Δv dc are considered as the inputs, and the capacitance C is applied as the output for the training of an FFNN [86]- [88]. In this way, the relationship between the input signals and the capacitance is established and, thus, the capacitance can be inferred indirectly. Similarly, it is demonstrated that the capacitance can be estimated by the FFNN constructed by the frequency domain information of dc-link voltage ripple. The potentials of FFNN in the capacitance estimation are illustrated in a hardware prototype [88].
In [108], considering the dynamic capability of RNN, an impedance identification method is proposed based on RNN to enable the stability analysis for power electronic systems over a wide frequency range. The RNN is applied to build a model that can produce identical outputs as the physical system given the same inputs. The inputs of RNN include three-phase voltages v a , v b , v c . The output is the a-phase current i a . As a result, the RNN-based model possesses the same frequency characteristics as the physical one. It can be performed for the impedance identification without interrupting the system operation.
In [103], an improved ANFIS is applied to estimate the capacitance and equivalent series resistance (ESR) of the supercapacitor. At monitoring time t, the inputs of the ANFIS Fig. 15. Examples of model-free methods of system parameter identification with AI. (a) Capacitance identification of dc-link capacitor [88]. (b) a-phase current estimation for calculating the impedance measurement of power electronic system [108]. (c) ESR estimation in future p steps for supercapacitors [103].
include the supply voltage V t , the supercapacitor temperature θ t , and a time series ESR t−400:100:t consisting of five previous ESR data points. The output of the ANFIS is the ESR estimations in future p steps. Experimental analysis indicates that ESR of supercapacitor can be accurately estimated and the normalized root-mean-square error of the ESR estimation is as small as 0.025 at condition monitoring time of 2600 h.
A summary for the framework of model-free parameter identification methods is given in Fig. 15. It can be seen that AI methods serve as the regression tool f (·) between the available input signals and the parameter to be monitored.
The model-free method is attractive for industrial applications due to less hardware cost. However, it is typically sensitive to external noise and disturbance due to the lack of system model. Thus, its robustness should be carefully considered. This issue can be possibly mitigated with a large amount of data in the training stage [159] to cover situations in field applications as much as possible. Nevertheless, the data collection is time-consuming and costly.
Another category of the system parameter identification is the model-based method. As the name implies, for a model-based method, system physics and models are partially known in advance and the identification model is formulated with unknown model parameters. In this way, the system identification task is equivalent to the exploration of optimal parameters of the model, which is essentially an optimization task. In this case, AI, especially the metaheuristic methods, is utilized as an optimizer to find the optimal solutions. Numerous approaches such as PSO [57], crow search algorithm [73], GA [69], etc., or their improved variants, can be exploited.
In [69], a parameter identification method for the health diagnostic of a PV panel is developed. The equivalent circuit of the PV panel is given in Fig. 16, and its system model is explicitly derived as Fig. 16. Dynamic model of a PV panel for parameter identification with the model-based method [69]. System parameters include the input current I ph , output current I o (i (t) ), voltage v sh across capacitor C sh , resistor R sh , p-n junction capacitance C sh , and resistor R s .
where I ph is the input current, I o is the output current, v sh is the voltage across the capacitor C sh , R sh is the resistance, and C sh is the p-n junction capacitance. As a result, the parameter identification is equivalent to find a parameter set G = {I ph , I o , v sh , R sh , C sh , R s } that ensures an identical output as the physical system. By injecting large signal disturbances to the panel voltages in the testing stage, the dynamic response of the current-voltage characteristics is sampled to calculate the objective function as where i p [k] and i[k] are the current output of the model and the physical system, respectively, and N 1 and N 2 are the start index and the end index for the sampling, respectively. Subsequently, an improved GA method is used to explore an optimal solution minimizing f obj (G) in (6). A similar investigation can be found in [57], where a modified PSO algorithm is applied to the internal parameter identification of a PV panel.
Due to the involvement of system dynamics and models, the amount of data required for the estimation can be significantly reduced for the model-based methods. Also, the overfitting risk in the model-free methods can be mitigated. It exhibits better dynamics to handle unexpected disturbance and switchable working modes. However, due to the system complexity, the system dynamics and models are challenging to formulate in most cases.
For parameter identification methods in power electronics, the accuracy and robustness under the complex environment should be considered. For example, for the condition monitoring of power MOSFETs in [131], the device is considered as failed if there is an increase of 0.08 Ω for the degradation indicator of drain-to-source ON-state resistance R DS(on) . Such a tiny increment is challenging to observe. Thus, more research efforts are necessary to improve the sensitivity of the AI-based parameter identification methods. Moreover, it is worth mentioning that computational burden and embedded capabilities should be considered for field applications.
2) Data Preprocessing and Feature Mining: Data preprocessing and feature mining are concerned with refining the raw data to better serve the applications, e.g., fault diagnosis. By exploring dataset structure, it includes data cleaning to reduce noise, data clustering to discover groups of similar data points, density estimation to identify the data distribution, data compression that projects high-dimensional data down to low-dimensional data to reduce the number of features, data fusion to integrate multiple information sources, etc. Typically, the performance of the subsequent PHM application, e.g., the diagnostic accuracy, can be significantly improved if the data preprocessing and feature mining are properly conducted.
In [131], a reliability assessment method for power MOSFETs based on a continuous-time Markov chain is proposed. To discretize the continuous degradation path of power MOSFETs without breaking the inherent monotonicity, a k-means method is applied to divide the evolution of drain-to-source ON-state resistance R DS(ON) into 11 discrete states, as shown in Fig. 17.
In [133], a health state identification method for IGBTs based on self-organizing maps (SOMs) is proposed. It is essentially a clustering task. The states of the device are clustered as the healthy state, partially degraded state, heavily degraded state, and failure state, considering the distance between the input measurements (including collector current I c , collector-emitter voltage V ce , and case temperature T ) and the best matching unit of the trained SOMs.
In [160], a composite failure precursor of SiC MOSFETs is developed with a data fusion technique of genetic programming, which is a variant of GA. It integrates multiple degradation signals of a power semiconductor device in a nonlinear way. Since the composite failure precursor is directly optimized in terms of the RUL prediction model, the prediction accuracy is improved by 35.3% and the prediction uncertainty is reduced by 16.3%. It indicates that data fusion in condition monitoring is potentially useful especially for system-level applications (e.g., converters), where multiple physical degradation signals exist.
An integrated toolbox "Diagnostic Feature Designer" for the feature identification is available in MATLAB [161], which can be applied to the data preprocessing and feature mining as an automatic tool.

B. Anomaly Detection and Fault Diagnosis
The anomaly detection makes a binary decision and focuses on the abnormal behavior identification. It provides an indication when the rated system characteristics or nominal parameters exceed the predefined safety range. Once the anomaly behavior occurs, the fault diagnosis [19] identifies and locates the detailed failure modes subsequently. Essentially, anomaly detection and fault diagnosis are the classification, regression, or clustering tasks. Based on the learned relationship from the training stage, it determines the fault label when a new fault signature becomes available. Note that the feasibility of AI-based anomaly detection and fault diagnosis is based on two assumptions [33]: First, the fault occurrence in any components has an impact on the fault signature; second, the impact on these signatures varies with different fault modes and fault locations. The methods of anomaly detection and fault diagnosis can be categorized as supervised learning methods and unsupervised methods.
1) Supervised Learning Methods: In [93], an FFNN is applied to establish the nonlinear relationship of the inputs and outputs of a full-bridge diode rectifier. The training of the FFNN is completed at the normal operation mode of the rectifier, as shown in Fig. 18. As a result, the principles and mapping relationship between the inputs, including input voltage v i (t), input current i i (t), and output current i o (t), and the output signal of output voltage v o (t) are characterized, considered as a digital emulator indicating the normal operational mode of the rectifier. This digital emulator and the physical rectifier are simultaneously operated and their outputs are compared in real time. Once the monitored output voltage of physical rectifier significantly deviates from the output of FFNN, it suggests that the rectifier runs into an abnormal mode, which facilitates the anomaly detection. In this case, the FFNN essentially serves as the regression tool.
In [90], an open-circuit fault diagnosis algorithm is proposed for the inverter in a microgrid system subjected to varying load conditions. A signal processing method is proposed to reduce the required amount of information for the fault representation and suppress the impact of the load change. An FFNN is used as a diagnostic classifier. The computational burden of the proposed method can be reduced to 10% of that of the existing algorithms. In this case, the FFNN serves as a classification tool. Similar fault diagnosis ideas include the ANFIS to determine the severity levels of a capacitor in the dc-link filter [102].
In [112], a multiswitches fault diagnosis algorithm for voltage-source inverters is proposed. An echo state network (ESN) is used as a diagnostic classifier given small lowfrequency data. Note that ESN is an improved variant of RNN to avoid gradient exploding and vanishing in the training. In this work, the diagnostic performance of ESN is compared with the FFNN, the FFNN with a wavelet activation function, and the RBFN. It indicates that the ESN is superior in the sensitivity, design process, and training speed.
In [115], an 1-D CNN is applied to the fault diagnosis of a modular multilevel converter. One advantage of 1-D CNN is that the feature extraction and diagnostic classification can be integrated together, which enables the fault diagnostics on the raw data directly. In this way, the feature extraction, which is usually experience-intensive, can be avoided. The experimental results indicate that the proposed method is highly reliable and provides a detection accuracy of 98.9% and a fault diagnostic accuracy of 99.7% within 100 ms.
In addition to the previous NN-based methods, kernel methods, including the SVM and the RVM, are also applied for anomaly detection and fault diagnosis. One advantage of the kernel methods is that the dataset size requirement is relatively lower than the NN-based methods.
In [7], based on the time-domain fault features, an SVM-based fault diagnosis method is proposed for incipient yet progressive faults of IGBTs in an inverter. The training of SVM can be completed by metaheuristic methods (e.g., PSO, GA, etc.). For a total of 41 fault classes, it achieves an average accuracy of 94.82% being robust to both load variations and motor parameter shifts.
In [127], an RVM is applied for the fault diagnosis of a cascaded H-bridge multilevel inverter. PCA is applied to extract the fault signal feature. Experimental analysis indicates that the RVM outperforms the FFNN and the SVM, with 100% diagnostic accuracy in this specific case study. Compared to SVM with the direct fault label as its output, RVM is formulated under the Bayesian framework. It makes probabilistic outputs of the fault information, which possesses good theoretical guidance and is favorable to the uncertainty analysis on diagnostic results. Generally, for the same task, the RVM is sparser than SVM, indicating faster speed for field applications. However, the training time of RVM is generally longer than SVM.
2) Unsupervised Learning Methods: In [136], PCA is applied to the anomaly detection of SiC MOSFETs. Multiple statistical features, including kurtosis, skewness, etc., are considered as the inputs of the PCA algorithm. The output is compact with fewer features and a transformation matrix. For field applications, the newly available data are applied to the transformation matrix for the calculation of an anomaly index. Abnormal behavior is notified when the anomaly index exceeds a predefined threshold. The method is verified by a processor-in-the-loop experiment. This detection mechanism is similar to the that in [93]. Other unsupervised learning methods in anomaly detection and fault diagnosis, including k-means and SOMs, can be found in [118].
3) Discussions: Table V summarizes the features of typical AI algorithms and their variants for anomaly detection and fault diagnosis. It can be seen that each AI algorithm possesses advantages and limitations. To fully exploit the advantages of each algorithm, it is effective to combine multiple algorithms for a decision-level fusion to improve the diagnostic accuracy and robustness. An example of decision-level fusion for fault diagnosis of IGBTs can be found in [96]. More ensemble methods to combine multiple algorithms can be found in  [1,Ch. 14]. From the AI perspective, there is a negligible difference between power electronics and other engineering areas (e.g., electromechanical applications) in terms of the anomaly detection and fault diagnosis tasks. Two reviews of AI methods in anomaly detection and fault diagnosis can be found in [162] and [163].
Note that various AI methods and their variants have been successfully applied to anomaly detection and fault diagnosis. There are differences in terms of how the data are collected and types of available data in different applications, which is an important aspect of practical applications of AI. An integrated platform "Predictive Maintenance Toolbox" is available in MATLAB [164], which includes various algorithms of anomaly detection and diagnostics. It is beneficial for the method development and benchmark analysis. From the AI perspective, most of the methods can be interchangeably applied with a comparable performance in terms of the evaluation accuracy. Although the accuracy can be further improved by advanced algorithms (e.g., deep learning methods), the accuracy improvement after a high score, e.g., 90%, is relatively less significant compared with other practical concerns. More considerations should be devoted to the gap between theoretical algorithms and practical implementations, where the practical considerations include the following.
1) In addition to the single component fault, the failure mode of multiple components failed simultaneously should be considered. The dependence and coupling effects among the component failures should be incorporated into the diagnostic algorithms. 2) Considering the challenges in the data acquisition of power electronic systems, the training dataset for practical application is typically limited. This situation is even worse for a dataset with unbalanced fault labels, i.e., the ample data of the normal operation case and the scarcity of data with fault labels due to catastrophic failures. Thus, the algorithm applicability given limited size of dataset and poor quality dataset should be investigated.
3) The practicality, including computational burden, adaptive capability, robustness, difficulty of algorithm design and debugging [112], implementation cost, etc., should also be comprehensively considered.

C. RUL Prediction
Lifetime prediction in the design phase is to support the DfR, which refers to the feature of a population of units. As one of the critical aspects of PHM [165], the RUL prediction is not to predict the lifetime of a population of units. It predicts the residual lifetime of an individual unit in service based on the condition monitoring information. There are associated uncertainties in the lifetime prediction, including model calibration errors, manufacturing tolerances, variations of operational environments and workload, etc. These uncertainties result in inaccurate reliability estimates for a specific unit in field operation [166]. RUL prediction is applied as an additional tool to reduce the uncertainties for reliability-critical, safety-critical, or availability-critical applications.
The flowchart and procedures for RUL prediction are given in Fig. 19. The regression model can be established based on historical dataset. The probability density function (pdf) of degradation level at any specific condition monitoring time can be estimated based on the regression model. The pdf of the RUL can be derived from the pdf of the degradation level. Given the fact that the system is properly functioning at condition monitoring time t, its RUL l is defined as the residual lifetime when the degradation process D(t) exceeds the failure Fig. 20. RUL prediction of power MOSFETs based on ESN [111]. For the network training, the input weights W in and the recurrent weights W are randomly generated. The output weights W out are estimated by least-square methods.
threshold w, i.e., where D 1:j is the cumulative CM information up to time t. Note that RUL l is a random variable. In addition to its expected value, the uncertainty metrics with the lower and upper confidence interval (l lo , l up ) are also of great importance. AI methods in RUL prediction is typically dealing with a nonlinear regression between the degradation information and the corresponding RUL based on the training dataset [167]. In this way, degradation patterns can be characterized. Once the degradation patterns have been learned, it can be directly projected based on the regression model to facilitate the future degradation level prediction. As a result, the RUL can be estimated. In [111], an ESN is applied to the RUL prediction of power MOSFETs, as shown in Fig. 20. The input of the ESN is the degradation indicator drain-to-source ON-state resistance R DS,(on) at times k − 1 and k, and the output is the R DS,(on) at time k + 1. To facilitate the adaptation of the ESN, a particle filter is exploited to recursively update the output weights when new condition monitoring data of the in-situ device becomes available. In this way, the degradation model is adaptive to varying external environments and operational modes. Another NN method involving TDNN for the RUL prediction of IGBTs can be found in [114].
In [119], Gaussian processes regression is applied to the RUL prediction of IGBTs. For the degradation modeling, the nonlinear relationship between the decrement of ON-state collectoremitter voltage ΔV ce,on and the condition monitoring time is established by the Gaussian processes regression. Since Gaussian process is formulated with the Bayesian framework, it is able to predict the uncertainty of variation ΔV ce,on intrinsically. It can be seen from Fig. 21 that the error bar of the evolution of ΔV ce,on is explicitly derived, which can be further utilized for the calculation of the confidence interval of RUL. Another example of the kernel method for RUL prediction can be found in [74], where an SVM is applied to the degradation modeling of a buck converter.
To make AI-based methods of the RUL prediction more practical for field applications, more efforts should be devoted to the following aspects. 1) Uncertainty quantification: Compared to other regressionrelated tasks, e.g., control applications, the capability of uncertainty quantification is more critical for RUL prediction. As shown in Fig. 19, the RUL is a random variable and, thus, quantification of the confidence interval is essential for the optimal decision-making. These uncertainties come from the population heterogeneity, measurement noise, varying operational settings, etc., which should be comprehensively considered for a practical solution. AI methods are rather challenging for the uncertainty quantification of prediction results considering the black-box feature. Several feasible approaches include the Monte Carlo methods [114], incorporating particle filter in the NN [111], and Bayesian-based AI methods (e.g., Gaussian process, RVM). Another promising direction is the stochastic data-driven methods [154], [160], [168], which can intrinsically provide the pdf of the RUL for calculating the confidence interval. 2) Adaptive capability: It is concerned with the model parameter tuning layer in Fig. 14 for connecting the offline models and the online models, which is a key step for practical applications. If a specific AI method lacks an adaptive capability, its application is limited since one prerequisite is that the training data and the test data should be generated under similar situations (e.g., external environments and operational modes) and share a high-level similarity [95]. It is challenging for power electronics since operational settings of the in-situ system (i.e., the test data) are quite different from that of the training dataset, which is generally obtained with accelerated testing experiments. The majority of the research [74], [114], [119] assumes that the operational settings of the in-situ system are identical to the training dataset (e.g., accelerated aging experiments), which may not be the case in field applications. Thus, the adaptive capability of the AI-based RUL prediction method is critical to bridge academic research and industrial applications. Other promising directions of model parameter tuning include the explicit mapping relationship derivations [169] and transfer learning [170], [171] of degradation characteristics under various operational settings (temperature, voltage, humidity, etc.). This may, however, imply intensive investigations of system models. For the heatsink design of a converter system, a large number of decision variables, e.g., weight, volume, pattern, need to be determined, which is essentially an optimization task. The metaheuristic methods are applied to the optimization that involves an iterative trial-and-error procedure. Although the computational effort is intensive, the design task is typically performed offline. There is less requirement on the algorithm speed in this case. Although the metaheuristic method-based optimization does not ensure a global solution, the suboptimal heatsink design is still superior and satisfactory in most cases. Thus, the algorithm accuracy is not critical as well. The training dataset and interpretability of the optimization process are not required.
For the intelligent controller of a converter system, the realtime control errors, e.g., the voltage error, the current error, need to be returned to the controller for the adaptive updating in an online mode. Thus, the requirements of algorithm speed and accuracy are the most critical. In addition, the controller stability needs to be theoretically ensured, and thus, the interpretability is critical. The intelligent controller is generally tuned online, it is unnecessary to prepare the dataset for the model training.
For the RUL prediction of switching devices in a converter system, the requirement of the algorithm speed is moderate since the device degradation is slow and the long time span of decision making is acceptable. The degradation model for the RUL prediction can be prepared in offline mode and efficiently tuned in online mode, and the computational effort in this application is moderate. Since the model accuracy is highly dependent on the dataset, the dataset requirement, e.g., dataset quality, dataset size, label balance (e.g., limited abnormal data in the training dataset), etc., is the most critical. Moreover, the interpretability of the RUL prediction results with uncertainty is critical as well. As a result, a comparison of AI algorithms in each phase of the life-cycle of power electronic systems is provided in Table VI.
It is concluded that AI possesses immense potentialities in power electronic systems. Many opportunities and issues are yet to be explored as follows.

1) Motivations and Justifications of AI Applied to Power
Electronic Systems: Although there are numerous studies on AI for power electronic systems in the literature since the 1990s, the practical implementations in industry are still limited, which is a sharp contrast compared to the claimed AI potentials. It is necessary for deeper investigations into tasks where AI can essentially outperform conventional methods. The justifications of AI-based solutions should be clearly identified by comparing to conventional methods from the industrial perspectives, e.g., implementation complexity, algorithm accuracy and robustness, algorithm accountability, extra hardware cost, computational energy consumption, embedded capability, etc.

2) Interwoven AI Implementations Through Life-Cycle
Phases: Implementations of AI in each life-cycle phase of design, control, and maintenance will facilitate flexible functional interactions. This feature is beneficial to overall performance optimization and procedure simplification. It enables the system capability in managing data flow between electrical and other disciplines (e.g., mechanical area) [13] as well. For example, aging information obtained by the AI-based system parameter identification can be flexibly incorporated into the AI-based controller for reliability improvement. Therefore, more attention should be paid to the interwoven interactions powered by AI. 3) Multilevel Information Fusion: Robustness is essential for safety-critical power electronic systems. Multiple sources of information and models are available in most cases for a specific application of power electronic systems. If these information sources and models are simultaneously exploited, possible biases can be mitigated to improve robustness. Multilevel information fusion can be performed at the data-level [160], [172], feature-level, decisionlevel [96], and their combinations, in order to exploit the insights of each information sources. For example, the well-established differential equations of power converter system can be integrated with AI as a hybrid solution for condition monitoring. As a result, the advantages from both the model-driven side and the data-driven side can be gained for better accuracy and robustness. 4) Computation-Light AI: Compared to other industrial areas (e.g., image recognition), one of the key features of power electronic systems is that there is no powerful computation unit. While real-time applications, e.g., control, impose a rigid requirement on the algorithm speed. Although complex deep learning techniques [170] can provide superior performance, it is computationally intensive for power electronic systems. A prospective direction is the computation-light AI algorithms that can be implemented on cost-effective units but provide comparable performance with deep learning algorithms. 5) Data-Light AI: One of the bottlenecks of AI implementation on power electronic systems is the dataset. For example, AI-based solutions for RUL prediction require the dataset to be versatile enough for accurate degradation behavior learning. However, the dataset size is generally small since the degradation experiments are resourceconsuming. This situation is even worse for safety-critical cases. Thus, developing AI algorithms with lower dataset requirement, i.e., data-light AI solutions that can provide acceptable performance in the presence of poor datasets, is a prospective direction. 6) Explainable AI: Most of the AI algorithms in power electronics suffer from the "black-box" feature. For example, most of the AI-based solutions for RUL prediction can only provide a point estimation without sensitivity analysis and uncertainty quantification. It makes AI-based solutions opaque and less convincing for practitioners to implement in industry applications, especially for safety-critical cases. There is a pressing need to improve the algorithm transparency for explainable AI with better interpretability. Understanding how models come up the decisions is critical for model simplification and safety, with which AI solutions can be implemented with confidence. 7) Dataset Privacy: An increasing attention has been paid to the data privacy, e.g., General Data Protection Regulation [173] in the European Union. With these critical regulations, the training of standard AI algorithms is challenging since a centralized data collection may be not feasible in the future. Thus, for power electronics applications, it is promising to develop a collaborative learning scheme for AI algorithms without collectively aggregating data from different locations, e.g., federated learning [174]. It is well aligned with the trend of data privacy regulations for the implementation of AI solutions. 8) Power Electronics Database: Due to the complexity of system dynamics of power electronics, extensive datasets are required for the model training, especially for the maintenance applications. While the experimental testing for data collection is generally time-consuming and expensive. There is a compelling demand for building up common power electronics data and knowledge base. These open-source datasets are critical to benchmark algorithm performance and accelerating application development. It will benefit the global power electronics communities in academia and industry.

VII. CONCLUSION
Existing AI methods in power electronic systems are comprehensively reviewed in this article. New findings are identified as follows.
1) From the application perspective, the AI methods applied in power electronic systems can be categorized as the design, control, and maintenance. The usage percentage, application trend, features, and requirements of AI in each life-cycle phase are discussed. 2) From the method perspective, the AI methods applied in power electronic systems can be categorized as expert system, fuzzy logic, metaheuristic methods, and machine learning. The usage percentage, advantages, and limitations of relevant AI algorithms in each category are comprehensively compared.
3) From the function perspective, the AI-related applications are essentially dealing with the optimization, classification, regression, and data structure exploration. 4) The milestones of relevant algorithm variants and applications are identified and organized as a timeline map. 5) For each life-cycle phase, illustrative examples are discussed and the challenges and future research opportunities are identified.