Machine Learning and Data-Driven Techniques for the Control of Smart Power Generation Systems: An Uncertainty Handling Perspective

Due to growing concerns regarding climate change and environmental protection, smart power generation has become essential for the economical and safe operation of both conventional thermal power plants and sustainable energy. Traditional ﬁrst-principle model-based methods are becoming insufﬁcient when faced with the ever-growing system scale and its various uncertainties. The burgeoning era of machine learning (ML) and data-driven control (DDC) techniques promises an improved alternative to these outdated methods. This paper reviews typical applications of ML and DDC at the level of monitoring, control, optimization, and fault detection of power generation systems, with a particular focus on uncovering how these methods can function in evaluating, counteracting, or withstanding the effects of the associated uncertainties. A holistic view is provided on the control techniques of smart power generation, from the regulation level to the planning level. The beneﬁts of ML and DDC techniques are accordingly interpreted in terms of visibility, maneuverability, ﬂexibility, proﬁtability, and safety (abbreviated as the ‘‘5-TYs”), respectively. Finally, an outlook on future research and applications is presented.


Introduction
For decades, power generation has been widely recognized as a major contributor to environmental pollution and carbon emissions [1]. The power generation sector reportedly accounted for nearly two-thirds of emissions growth in 2018, with coal-fired power generation as the single largest contributor (around 30% of gross emissions) [2]. Concerned with the growing issue of climate change, major countries around the world are compelled to ''hold the increase in the global average temperature to well below 2°C above pre-industrial levels" [3]. Toward this goal, efforts to reform power generation include optimizing the efficiency of the currently prevalent thermal power generation and expanding the penetration of sustainable energy, including hydropower, solar power, and wind power.
Control and optimization are essential for the efficient and safe operation of these power generation systems [4]. Given the multiple time-scale characteristics of multiple layers, a hierarchical control framework is generally deployed [5,6] for power generation systems to accomplish the salient task for each level, as shown in Fig. 1. At the lowest measurement process level, visibility must be maintained while important variables are measured and monitored. Based on these variables, regulatory controllers are placed in the field to steer each single-loop process [7], such as temperature, pressure, and water level, to the operating point designated by the upper-level supervisory control level. In this regard, the task of the regulatory control level is referred to as ''maneuverability" in this paper, which describes how quickly and stably the targeted loop can act when desired. The supervisory control level employs advanced control algorithms to maximize the flexibility of many interacting loops by accounting for multivariable couplings while satisfying operational constraints [8]. At the highest level of economic planning, overall efficiency or profit metrics are formulated and optimized to provide steady-state set-points for the lower layers of dynamic controls [9]. In addition to the bottom-to-top control levels, fault detection and diagnosis (FDD) is essential for safe operation and longer plant lifetimes [10]. The hierarchical structure in Fig. 1 can be used to manage either a complete power generation system, such as a fuel cell unit, or a subsystem, such as the boiler combustion furnace of a coal-fired power plant. Traditionally, developing an accurate model for each level in Fig. 1 is critical in order to fulfil multiple objectives. The internal variable of monitoring for visibility is usually realized by a state observer or a Kalman filter based on a state-space (SS) model, such as battery core temperature estimation [11]. The widely used proportional-integral-derivative (PID) controller for regulatory maneuverability typically requires a process model for parameter tuning [12]. For the flexibility level, model predictive control (MPC) accounts for the largest share of the supervisory control algorithms. MPC formulates the multivariable constrained optimization problem into a receding-horizon quadratic optimization framework using model-based output prediction. A typical MPC application is demonstrated in Ref. [13] for the supervisory control of a solar combined cycle plant. For the economic planning level, dynamic programming serves as a very popular algorithm to schedule the energy flow demand among different power sources, typically at an hourly rate, and can be applied to, for example, the energy cost minimization of a complex tri-generation plant [14] or the operational cost minimization of a hybrid power plant [15]. Fault detection is usually carried out based on a model known as a priori, as is the case in a recent application in the air-feed system of a fuel cell [16]. A significant trend in recent years is the integration of the economic planning and supervisory control levels, based on the framework of economic model predictive control (EMPC). EMPC has the capacity to realize economic optimization and dynamic operations simultaneously by directly formulating an economic index subject to the system model and various constraints. EMPC has already been studied in the throttling loss minimization of a boiler-turbine unit [17] and the comfort maximization of a building cogeneration system [18]. Although efficient, model-based methods are gradually becoming incapable of dealing with the ever-growing scale of energy systems with various uncertainties. This paper summarizes several typical uncertainties that are commonly encountered at each of the levels listed in Fig. 1. These uncertainties are discussed one by one in the following sections. The 21st century is witnessing the booming prosperity of machine learning (ML) and data science [19]; such a boom may be the key to addressing growing difficulties regarding scalability and uncertainty. In this era of big data, many disciplines-such as particle physics [20], material science [21], and process system engineering [22]-have seen a drastic shift from model-based analysis to ML and data-driven (DD) methods. ML and DD techniques have revolutionized the monitoring, control, and optimization of modern energy systems, including conventional fossil fuel power plants and renewable energy systems. Common ML algorithms include unsupervised learning, supervised learning, and reinforcement learning (RL) [23], each of which has been applied at different levels of energy systems to address different problems. DD techniques usually use real-time or historical data to directly control the process, including iterative feedback tuning (IFT) [24], iterative learning control (ILC), and active disturbance rejection control (ADRC) [25,26], among other techniques. DD methods usually have an extended scope and run faster than ML methods; they have widespread usage in meeting the high real-time capability requirements of regulatory control levels.
This paper does not attempt to provide a comprehensive review of every method in all energy system applications; rather, it aims to demonstrate how ML and DD methods can be suitably deployed to improve the visibility, maneuverability, flexibility, profitability, and safety (abbreviated as the ''5-TYs") of power generation systems in order to handle the uncertain challenges at each level. Following from Fig. 1, the 5-TYs can be defined as follows: Visibility: The measurement and transmission of measurable variables and the estimation of internal unmeasurable variables. Maneuverability: The rapidity and accuracy of the response of the bottom-level regulatory control, mostly in single-loop processes.
Flexibility: The extent to which multivariable coordination can reach in the supervisory control level. Profitability: The economic cost or benefit of the whole system or of an important subsystem. Safety: The FDD of the system, which prevents danger to the energy generation system. In smart power generation, the visibility level is the basis of the other levels, as it involves sensing the internal conditions for use in control, optimization, and diagnosis. A strong maneuverability level permits flexibility and profitability, while the safety level is the watchdog that protects the whole system. This paper comprehensively reviews ML and DD methods: From conventional thermal power generation to the emerging field of renewable energy; From the deterministic scenario to the stochastic environment; From the bottom level to the top level of the whole operations management framework. The motivations of this paper in choosing the perspective of uncertainty handling are as follows: Uncertainties widely exist in all levels of power generation. As Roger Brockett said, ''if there is no uncertainty in the system, the control, or the environment, then feedback control is largely unnecessary" [27]. The nature of uncertainties differs at different levels, and special care is required to handle each uncertainty. For example, disturbance uncertainty at the maneuverability control level should be estimated and rejected, while environmental uncertainty at the profitability level should be modeled as a stochastic process and then taken into consideration during economic optimization. This paper focuses on the power generation side; literature on the power grid will not be discussed. The rest of this paper is organized as follows. Bottom-level visibility and maneuverability, in which DD and ML algorithms must respond quickly to regulation requirements, are discussed in Section 2. Section 3 reviews DD model-based predictive control for supervisory flexibility and various unsupervised and RL methods in the energy system planning level, at which the computational time ranges from minutes to hours. The DD FDD methods are reviewed and compared with the model-based methods in Section 4 for power generation systems. Section 5 concludes the survey and depicts future research for smart power generation.

Visibility and maneuverability
Visibility requirements concern variable measurement, quantitative process characterization, and hidden-variable soft sensing. Inevitable stochastic noise in the measured signals is the primary uncertainty to be addressed at this level. Maneuverability is realized based on process identification and measured or estimated signals from the visibility level, with the primary goal of uncertain disturbance rejection.

Dynamic characterization
System identification is a classical DD method for dynamic system characterization. It is generally treated as a black box due to the difficulties in physical modeling. Since the 1960s, this discipline has received considerable attention and attained great success, even preceding the prosperity of ML [28]. It is used to characterize the underlying structure and parameters behind the input/output data of power generation processes by exercising certain activations as the control input. Classical step response-based transfer function identification is the most common method used in power plants. Applications of step response identification to energy systems include the water level identification problem in a regenerative heater [29], fuel cell temperature identification [30], and the multivariable fluidized-bed combustor [31]. The classical step response method has proved to be incapable of identifying high-order processes in the presence of measurement noise [32]. To mitigate this issue, a hybrid time and frequency domain identification method is developed in Ref. [33] for heat exchangers, a common high-order component in energy systems.
Sensor noise is the central issue to be addressed by modern systems identification methods. Additive white Gaussian noise (AWGN), which primarily originates from thermal noise, is the most frequently encountered form of sensor noise in power generation systems. Multiple mature DD methods have been developed to address AWGN in energy systems; the most common method is the use of a minimization criterion such as the square error [34]. A single-input single-output (SISO) example can be found in Ref. [35], where an adaptive recursive least-squares (ARLS) method is used to identify in real time the regression parameter of the fuel cell hybrid system model-that is, a linear difference equation with AWGN, or an ''autoregressive with extra input (ARX)" model. This ARX-based recursive least-squares (RLS) identification method is one of the most popular AWGN-effect removal methods in almost every power generation sector, including the wind turbine generator sector [36], solar power generation sector [37,38], thermal power plant sector [39], and energy storage systems sector [40,41]. For non-Gaussian colored noise, the battery parameter identification study in Ref. [42] introduces an instrumental variable method that improves the least-squares identification method over the conventional RLS.
The noise problem becomes even more intractable when it comes to the multi-state system described by the SS model. For a given SS physical model with unknown parameters (i.e., a grey box), the Cramer-Rao bound analysis is used for the parameter identification of battery [43] and hybrid energy storage systems [44] to handle AWGN in the battery voltage measurement. To circumvent analytical difficulties in theoretical solutions, heuristic optimization methods are extensively used to identify the SS model parameters of energy systems, such as in fuel cells [45], solar cells [46], and water turbines [47]. For a black-box system without any information on the physical mechanism and the SS model order, subspace identification (SID) is usually applied. Examples of such systems include fuel cells [48], power plant reheated temperatures [49], and fluidized-bed combustors [50].
The above system identification methods have conventionally required a specific type of input excitation signal and work mostly on linear systems. This convention has changed with the development of ML methods, which are able to identify a complex nonlinear system based primarily on a massive data record. Shallow neural networks (NNs) are one of the most popular methods applied in energy systems such as the dynamic modeling of fuel cells [51,52], boiler-turbine units [53,54], and solar power generation [55]. To reduce the structural risk, a support vector machine (SVM) is also widely used for energy systems identification [56][57][58]. In the past decade, along with the resurgence of deep learning, long short-term memory (LSTM) has become increasingly prevalent because it better handles the time-series data of power generation systems [59].

Soft sensing
Since some of the critical variables in energy systems may not be directly measurable, soft-sensing techniques, including modelbased state estimation [60] and DD algebraic correlation [61] algorithms, effectively visualize internal phenomena and provide feedback signals for the upper control levels. Model-based state estimation usually suffers from stochastic noise uncertainty and sensor inaccuracy. Some DD methods have been incorporated to remedy this insufficiency, such as battery core temperature estimation based on state augmentation and feedback correction [11].
DD algebraic correlation-based soft-sensing methods aim to estimate unmeasurable variables (also called primary variables) based on measurements of secondary variables [62]. Although not directly measurable while the system is in operation, primary variables can be measured offline and/or accessed intermittently with a high cost per sample. Therefore, the essential task of soft sensing is to determine the relationship between the primary variables and secondary variables based on the finite observed data. To this end, regression or curve fitting can be used. For example, an evidential regression model was learned as a soft sensor to monitor the powder concentration in a coal mill [63], and the partial leastsquares (PLS) regression was trained to predict the NO x emission in a 1000 MW power plant [64].

Regulatory control
To implement strong maneuverability, many feedback controllers are deployed at the regulatory control level. This level receives sensed signals from the visibility level and reference commands from the upper levels. Its primary goal is to mitigate the effects of unmeasurable and uncertain disturbances [65]. Modeling each loop and designing individual feedback controllers is timeconsuming and costly. Therefore, DD control methods now play a central role in the industrial regulatory control level [26]. This paper reviews the applications of PID control, ADRC, and ILC for some typical disturbances.
PID control is still the dominant controller in power generation systems due to its ease of use and negligible computation time in the fast-response-requiring environment of rapid maneuverability [12]. PID control uses a combination of proportion, integration, and derivation of real-time error data, rather than a physical model, to adjust the actuator and maintain device operations under optimal conditions. The difficulty usually lies in tuning the controller parameters. ML techniques are sometimes incorporated to improve the performance; examples include NN-enhanced PID control application in a thermal power plant [66], a fuel cell [67], a solar power plant [68], and a wind turbine [69]. In addition, fuzzy logics are very popular for adjusting PID parameters online in applications such as wind turbines [70], fuel cells [71], solar power generation [72], and combined cycle power plants [73]. To fully exploit the potential of historical data, IFT has also been investigated for tuning the PID parameters of a boiler-turbine unit [74]. IFT is an interesting approach to iteratively improve the control performance by learning from the performance of previous tasks.
Due to the limitations of PID control in dealing with nonlinearity and model uncertainty, ADRC is emerging as a disruptive DD control technology. Like PID control, it does not require a physical model for controller design [75]. The primary advantage of ADRC over PID control is that it can produce a satisfactory performance in both set-point tracking and disturbance rejection, which is attributed to its two-degrees-of-freedom structure. The DD compensation mechanism of ADRC is depicted in Fig. 2. An extended state observer is first designed to estimate the unknown dynamics and external disturbances, which are then directly compensated for in the control input through an analysis of the input and output data. The enhanced plant-that is, the grey block in Fig. 2-can be approximately compensated for as a cascaded-integrator process, so that the outer-loop controller can be readily designed. It is revealed in Ref. [29] that ADRC is able to accommodate actuator saturation with application to a regenerative heater in a 1000 MW power plant. Tuning of ADRC is discussed via an experimental application in the boiler furnace control [76]. The fluctuation of the power plant superheated temperature is reduced significantly by introducing a cascaded ADRC structure [77]. Recently, ADRC has also been introduced into the regulatory control of wind turbines [78], photovoltaic generation [79], and fuel cells [80,81].
ILC has been specifically proposed for use in addressing periodic disturbances [82], and has gained wide attention from the control community, although applications in power generation systems are relatively limited. ILC gradually modifies the control action at each time step by learning the corresponding time steps in the previous sequences. Typical periodic disturbances and explorative ILC applications in power generation systems include the fuel cell anode purge process [83] and wind turbine peak loads [84].

Flexibility and profitability
Flexibility refers to the ability of the supervisory control level to coordinate the operation among multiple loops; it forms the basis for profitability. Seeking maximum profit and minimal costs, the profitability level computes the optimal condition of the middlelevel process variables. Therefore, greater flexibility makes highly interactive energy systems easier and safer to maintain at a select few operating conditions with maximal economic efficiency.

Flexibility
The supervisory control level for system flexibility is primarily accountable for the coordination of a couple of basic regulatory loops. A more flexible multivariable controller design strategy enables a swift dynamic transition back to economically optimal conditions after any disturbance.
A multivariable model is still essential and currently plays mostly a basic role in the supervisory control practice, including power generation practices. Research studies on and applications of pure DD controls are somewhat limited, presumably due to the rigorous safety requirements in the power generation process. Without a model, it is usually difficult to ensure the stability of a multivariable control system. However, the primary challenge of model-based control is model uncertainty during condition transition, device aging, and environmental change. To this end, the ML and DD techniques can improve system robustness against model uncertainty.
For conventional supervisory control applications with limited computational resources, fuzzy logic is usually used to adjust the parameters in order to improve performance. By identifying a cluster of linear models for the main steam pressure in a power plant, fuzzy reasoning was used to adjust the parameters of the decoupling PID controllers online to accommodate the uncertain conditions of the coal mill [85]. Similarly, a flatness-based intelligent fuzzy logic controller was developed for a photovoltaic/fuel cell power plant to achieve a fast and stable response to the power system [86]. A hybrid classical and fuzzy control methodology was developed to control the steam temperature and water level of a power plant boiler [87]. Model information can be incorporated to enhance the DD control performance of multivariable ADRC, such as multivariate control applications in water tank demonstration [88] and in the direct energy balance control of a thermal power plant [89].
The above supervisory control methods are now somewhat outdated in light of the rapid development of industrial computing power, which enables the application of advanced, computationally expensive control algorithms such as MPC. When a physical model is absent, the SID method is commonly used to develop a DD model for MPC. A combined method of fuzzy clustering and SID is proposed in Ref. [90], such that the multivariable coupling and operational constraints of a boiler-turbine unit can be formulated and handled under the MPC framework. For fuel cell systems without complete online measurement of all output variables, a SID method is directly embedded into the MPC to realize complete DD control [91]. Recently, DD-enhanced MPC was used in the pollution control [92] and carbon capture control [93] of coal-fired power plants. Along with DD methods, ML methods are also combined with MPC. An NN was used to train the model for MPC, and showed success in dynamic energy management systems [94]. In addition, least-squares SVMs (LS-SVMs) and PLS are respectively used to identify fuel cell systems, based on which MPC is deployed to realize fast power tracking with constraints on the operating temperature [95]. A multilayer perception-based MPC is proposed in Ref. [96] for the superheated-steam supply systems of nuclear power plants. The primary disadvantage of the ML-based MPC is that the closed-loop stability cannot usually be ensured.

Profitability
The economic planning level for profitability is the topmost level of power generation systems. It usually works on an hourly or even daily basis, thus having sufficient time to compute the economic reference for the lower levels. Traditionally, data-mining methods are used to compute the most economical operation from the historical data. For example, in a recent unsupervised learning application, the size of the historical data of a power plant desulfurization system was first reduced by principle component analysis (PCA), from which a fuzzy C-means clustering method was used to derive several groups with similar operating conditions. Therefore, the economic reference for the running system could be determined as the lowest desulfurization cost point of a similar group [97]. In other words, the combined PCA and clustering methods aim to search for the best point by comparing the current condition with similar operating conditions in the group it belongs to. However, this method can search only for existing conditions of the database and cannot ensure optimality. This is a different methodology from boiler combustion optimization [98]. The combustion efficiency and the amount of pollution emission are regressed in terms of a large number of boiler variables based on the LS-SVM. A genetic algorithm is then used to optimize the condition setting, balancing the combustion efficiency and pollution emission.
When it comes to renewable power generation systems, the presence of uncertain environmental variables, such as the intermittency of wind and sunlight, as well as fluctuations in the consumptive load, makes economic planning more difficult. To this end, a reasonable forecast for each uncertain variable is critical for the next-step profitability decision. This is probably the most active area of research within the power generation domain, with a large volume of literature investigating a wide variety of ML algorithms. Taking wind power forecasting as an example, various artificial NN (ANN) structures, including feed-forward, time-series, recurrent, and deep NN, have been used to map different weather variables to a series of deterministic wind power prediction values in terms of different time scales (e.g., daily, weekly, and monthly) [99,100]. The statistical properties of wind power generation are evaluated by Bayesian methods, such as sparse Bayesian learning [101], the Bayesian nonparametric approach [102], and the Markov chain Monte Carlo (MCMC) approach [103], to derive a probabilistic distribution over a certain range. Recently, an ensemble twolayer ML model was developed to produce both deterministic and probabilistic wind power forecasts, in which the weather variables (temperature, humidity, pressure, and wind direction) were preprocessed via a deep feature selection block [104]. State-ofthe-art solar and load forecasting methods are similar to those of wind, as reviewed in the literature [105,106].
With the forecasting of intermittent renewables and uncertain loads, it becomes possible to optimize the economic planning of hybrid power generation and energy storage systems. RL appears to be a promising DD solution because it remains accurate when handling optimization problems with uncertainty, even without a model. Inherited from the Markov decision process (MDP) framework, RL is described by a set of agent states within an environment, a set of possible actions for each agent, and rules governing dynamic transition, preference, and observation [107]. By interacting with the host environment (i.e., receiving observations and rewards), the RL agents choose appropriate actions to maximize the reward. To overcome the analytical challenge of traditional optimization methods [108,109], RL converts extreme seeking or economic planning to pure data-learning problems for power generation systems with or without a physical/simulation model [110]. An intuitive single-agent Q-learning example comes from the maximum power point tracking (MPPT) control of wind energy conversion systems (WECSs), where the RL agent is the wind turbine, the transition states are the rotor speed and electrical output power, the action is the speed adjustment command, and the reward is defined as the increment of electrical power output [111]. For distributed energy generation with multiple power generation sources, a multi-agent fuzzy Q-learning method has been developed, in which the agents are the controllable devices, such as the fuel cell, diesel generator, battery, desalination device, and electrolyzer. With the RL coordinated actions among these adjustable elements, the cumulative expected discounted rewards are maximized to ensure system reliability and to minimize fossil fuel consumption [112]. Deep reinforcement learning (DRL) was introduced [113] to address the complex energy Internet problem by taking advantage of the strong approximation ability of NNs. Further examples on RL and DRL applications to power generation systems are available in a recent survey [114].

Safety: Fault detection and diagnosis
In general, the methods used for FDD in smart power generation fall into one of two categories: model-based and DD (case-based) approaches. Model-based approaches seek a quantitative relationship among the inputs, states, and outputs of a plant, subject to potential device uncertainties. The residuals between target outputs and model projections are calculated; fault(s) are detected and isolated if the accumulated residual is greater than a prescribed threshold. Taking a coal pulverizing system as an example, an observer-based FDD model made of SS equations was established to monitor faults, such as coal leakage and mill blockage; the experimental results showed that the observer-based FDD method performed well in nominal cases [115,116]. Nevertheless, unknown disturbances or uncertainties may render the observerbased FDD model inadequate. To prevent such failures, a DD FDD method was proposed with robust residual generators that are directly constructed from the available process data to detect faults, such as an application in a wind turbine FDD in the presence of unknown disturbances and measurement noise [117]. Furthermore, DD FDD methods might not need a priori information from a plant. An application of wind turbine fault detection can be found in Ref. [118], where NN and other regression methods are compared.
DD case-based approaches also view historical samples with different fault types as patterns locating in a hybrid feature space composed of inputs, outputs, and/or states. New observations are then compared against historical patterns to determine whether faults are present; if so, the detected fault will be assigned to the most similar fault type known. DD case-based approaches solve a classification problem, whereas model-based approaches solve a regression problem. In other words, any classification algorithm can be repurposed and deployed as an FDD model. Similarly, fuel cell FDD is carried out based on classification algorithms [119]. SVM and adaptive neuro-fuzzy inference system (ANFIS) classifiers were investigated in Ref. [120] to recognize the faulty conditions in a steam turbine unit. For other FDD models based on classification algorithms in energy systems, interested readers can refer to Ref. [121] and the literature therein.
The pervasive uncertainty hinders FDD application in power generation systems. The threshold values are usually determined by the users. In fact, almost all FDD models are sensitive to the user-given threshold: A small threshold would lead to many false alarms. There currently exists no general and widely accepted method to deal with imprecision and uncertainties, or to preset the user-given threshold in FDD. In addition to the above two issues, the safety protocols of power plants make it too expensive to obtain faulty samples (i.e., training samples). Therefore, FDD must be carried out based on normal operational data.
Due to the uncertainty difficulties, classical DD case-based approaches often struggle to identify potential faults by comparing new observations against historical operational data. To mitigate this difficulty, the Dempster-Shafer (DS) theory [122][123][124][125] of evidence extends probability theory and provides a general framework to interpret imprecision and uncertainty by taking the power set of all fault types/classes as the frame of discernment. Theoretically, given a set of c normal cases (or fault types) X = {x 1 , x 2 , . . ., x c }, probability theory defines a probability distribution p: X ? [0, 1], whereas DS redefines the probability as a mass function m: 2 X ? [0, 1]. Evidently, a mass function can describe not only the possibility or belief-that is, m({x q })-of an observation belonging to the normal case {x q }, but also the belief belonging to transient cases (e.g., {x q , x q+1 }) as well as belief pertaining to the ignorance X. In particular, when an observation has large ignorance m(X), such as m(X) ? 1, it would be identified as a new normal case (including the new cases deteriorated from the existing normal cases) or an unknown fault. In general, DS provides a more powerful tool for FDD to deal with imprecision and uncertainties in comparison with either probability theory or fuzzy set theory. Therefore, implementing DD case-based approaches in the DS framework can yield more meaningful interpretations for FDD, including the detection of normal cases, transient cases, new normal cases, deteriorated cases (from normal cases), and unknown fault cases, as illustrated in Fig. 3. DS theory has found several FDD applications in power generation systems. For example, a multi-sensor fusion and decision method based on DS theory and classification and regression tree is proposed in Ref. [126] for the diagnosis of a high-voltage circuit breaker (HVCB) used to protect power generation systems in case of contingencies. By borrowing the basic idea from Refs. [127,128], an FDD model is established based on the evidential k-nearest neighbor (EKNN) classification rule to perform monitoring and early warning on two practical equipment units in a thermal power plant [129].

Conclusions
ML and DD control methods have proven to be promising alternatives to traditional model-based methods at all levels of smart power generation systems operation, especially in the presence of uncertainty. This paper formulated the objectives and primary uncertainties at each level and reviewed how ML and DD methods can help in improving visibility, maneuverability, flexibility, profitability, and safety (the 5-TYs). For dynamic modeling that is subject to stochastic noise uncertainty, DD system identification methods play an important role in deriving algebraic models in the form of transfer functions and SS. In addition, ML-based regression methods have been revealed to be more powerful in characterizing nonlinear multivariable energy systems when big data is available. In addition to dynamic characterization, the visibility of the internal energy systems can be significantly enhanced by DD soft sensing. Based on the visibility information, the regulatory control level can improve the device maneuverability by utilizing a suitable DD control method for a specific type of uncertain disturbance. A first-principles model is still essential for the supervisory multivariable control level, but DD methods can be embedded into the MPC framework to enhance the flexibility of power generation systems against model uncertainty. The economic planning level relies heavily on ML methods to accommodate large-scale energy system optimization problems that are subject to various uncertainties. To improve system safety in the case of unknown faults, the DD DS theory shows great potential in the FDD of power generation systems when only the normal operation data are available. At present, EMPC still relies heavily on the process model, Fig. 3. DD FDD in the framework of DS theory. and the literature lacks in-depth studies on implementing ML algorithms into a combination of the supervisory and planning levels. Furthermore, compared with the booming development taking place in ML and data science, there exists a great gap between the latest ML algorithms, such as deep learning, and present applications in the smart power generation system. The primary difficulty prohibiting further applications of EMPC is the huge computational time required by online optimization. Efficient computation of EMPC is a promising topic for future research. This undervalued but exciting topic is still principally in its infancy; ML and DD methods hold great potential for improving power systems efficiency for a more sustainable future.