Artificial intelligence and machine learning approaches to energy demand-side response: A systematic review

Abstract Recent years have seen an increasing interest in Demand Response (DR) as a means to provide flexibility, and hence improve the reliability of energy systems in a cost-effective way. Yet, the high complexity of the tasks associated with DR, combined with their use of large-scale data and the frequent need for near real-time decisions, means that Artificial Intelligence (AI) and Machine Learning (ML) — a branch of AI — have recently emerged as key technologies for enabling demand-side response. AI methods can be used to tackle various challenges, ranging from selecting the optimal set of consumers to respond, learning their attributes and preferences, dynamic pricing, scheduling and control of devices, learning how to incentivise participants in the DR schemes and how to reward them in a fair and economically efficient way. This work provides an overview of AI methods utilised for DR applications, based on a systematic review of over 160 papers, 40 companies and commercial initiatives, and 21 large-scale projects. The papers are classified with regards to both the AI/ML algorithm(s) used and the application area in energy DR. Next, commercial initiatives are presented (including both start-ups and established companies) and large-scale innovation projects, where AI methods have been used for energy DR. The paper concludes with a discussion of advantages and potential limitations of reviewed AI techniques for different DR tasks, and outlines directions for future research in this fast-growing area.


Introduction
The growing trend of Renewable Energy Resources (RES), and their rapid development in recent years, poses key challenges for power system operators. To accommodate this new energy generation mix, energy systems are forced to undergo a rapid transformation. The majority of RES are characterised by variability and intermittency, making it difficult to predict their power output (i.e. they depend on solar irradiation or wind speed). These attributes make more challenging the operation and management of power systems because more flexibility is needed to safeguard their normal operation and stability [1]. The main approaches for providing flexibility are the integration of fast-acting supply, demand side management, and energy storage services [2].
In addition, power systems operation is entering the digital era. New technologies, such as Internet-of-Things (IoT), real-time monitoring and control, peer-to-peer energy and smart contracts [3], as well as cyber-security of energy assets can result in power systems which are more efficient, secure, reliable, resilient, and sustainable [4]. Moreover, several countries (both in the EU and worldwide) have set ambitious targets for mass deployment of advanced metering infrastructure (AMI) [5]; for example, in the UK, the Office for Gas and Electricity Markets (Ofgem) has stated a target of 53 million electricity and gas smart meters to be installed by 2020 [6].
The massive amount of data generated by this infrastructure (IoT, AMI) call for automated ways to analyse the resulting data. Additionally, the shift to more active, decentralised, and complex power systems [7], creates tasks which can quickly become unmanageable for human operators. AI approaches have been identified as a key tool for addressing these challenges in power systems. AI can be used to forecast power demand and generation, optimise maintenance and use of energy assets, understand better energy usage patterns, as well as provide better stability and efficiency of the power system. AI can also alleviate the load on humans by assisting and partially automating the decision-making, as well as automating the scheduling and control of the multitude of devices used.

Motivation and scope of the review
Artificial Intelligence (AI) approaches have been utilised across a range of applications in power systems, but only recently have begun attracting significant research interest in the field of demand-side response. Demand response (DR) has been identified as one of the promising approaches for providing demand flexibility to the power system; thus, increasing the scale and scope of DR programmes is of key importance to many system operators. This enhanced function of DR schemes requires a framework which is automated and able to adjust in a dynamic environment and learn (e.g. consumers' preferences). This framework can be created with the assistance of AI techniques; in fact, it is increasingly apparent that AI can contribute greatly in the future success of DR schemes by automating the process, while learning the preferences of end-use consumers.
The rising interest in AI-based solutions in the DR sector is well illustrated by the sharp increase of research interest in this domain. The number of scientific publications on the subject has seen an order of magnitude increase (around 15 times), between 2012 and 2018, as shown in Fig. 1. This trend has intensified the need for a systematic review to summarise the AI algorithms used for the various DR application areas. In fact, most of these workswhile providing valuable contributionstend to focus on exploring only a specific AI/ML technique and application domain. In our view, the rapid development of the field highlights the need for a comprehensive review that traces the evolution of the field, and acts as a guide for the most promising AI techniques used in specific sub-areas of DR, based on the existing body of knowledge reported so far in existing publications.
Against this background, the aim of our paper is to provide a systematic review of the various AI data-driven approaches for DR applications. The goal of our review is three-fold: � First, we aim to provide a comprehensive overview of the AI techniques underpinning this area, as well as the main specific applications/tasks in energy DR to which these techniques have been applied. Therefore, offering a broad perspective of the field's evolution and potential future research paths. � Second, we see our review serving as a useful guide for researchers and practitioners in the field. More specifically, this means informing them, for example, which AI techniques have been found to work best for their specific DR problem or application area (or at least which techniques have been mainly used by prior research in the energy DR space). This includes a systematic discussion of the advantages and drawbacks of using a specific AI technique in each application domain. � Third, we wanted to go one step beyond looking only at scientific papers and give some insights into the start-ups and more established companies applying these techniques, as well as to some of the industrially funded research projects in this area. As this is a very active field, which has seen considerable interest and investment, our review identifies no less than 40 companies/commercial initiatives and 21 large-scale projects.
To the best of our knowledge, this is the largest and most comprehensive review to date of the area of AI application in energy demandside response. More specifically, it includes 161 studies/papers (summarised in Table 1 of the Appendix), 40 companies and commercial ventures (summarised in Table 2) , and 21 large-scale research projects (summarised in Table 3).

Related reviews
There are numerous papers which have reviewed the energy demand response literature. In a more general setting, Siano [8] investigated the potential benefits of DR in smart grids, along with smart technologies, control, monitoring and communication systems, while Haider et al. [9] focused on the developments in DR systems, load scheduling techniques and communication technologies for DR. O'Connell et al. [10] examined the long-term and less intuitive impacts of DR, such as its effect on electricity market prices and its impact on consumers. There has also been work that has surveyed the economic impact of DR [11], whereas Dehghanpour and Afsharnia [12] examined the technical aspect of DR for frequency control. Moreover, Vardakas et al. [13] revised various optimisation models for the optimal control of DR strategies, along with DR pricing schemes.
More specifically, regarding AI approaches for DR there is the work of Shareef et al. [14] where the authors have reviewed literature that utilise AI techniques for the development of schedule controller in a home energy management system (HEMS) which incorporates a DR tool. Dusparic et al. [15] focused on the comparison and evaluation of a number of self-organising intelligent algorithms for residential demand response, Yi Wang et al. [16] on load profiling in terms of clustering techniques, and V� azquez-Canteli and Nagy [17] focused only on the application of reinforcement learning for DR. Furthermore, Raza and Khosravi [18,19] surveyed AI based load forecast modelling work, focusing mainly on artificial neural networks (ANNs), Merabet et al. [20] reviewed the application of multi-agent systems (MAS) in smart grid technologies, including DR, and there has also been work which examines smart meter data analytics in applications for DR programmes [21]. Finally, Wang et al. [22] focus on the emerging concept of integrated demand response, integrating various energy types and vectors (not just electricity, but also natural gas, heat), while Lu et al. [23] focus on the aggregation of thermal inertia, especially from district heating networks. In contrast, our review focuses mostly on electrical demand, discussing more in-depth the AI techniques that can enable this process.
It is noted that, while these aforementioned reviews (which look at AI technologies for DR applications) have been very valuable, they tend to be smaller and narrower in scope. They often focus either on a specific AI technique such as reinforcement learning [17], or on a specific application setting, such as home energy management systems [14]. By contrast, the purpose of this paper is to provide a more comprehensive and holistic view of the AI techniques used in DR schemes, which support power system operation. We argue that a systematic review of this scale and scope is needed and useful to highlight potential research gaps and point future research paths in this rapidly growing area.

Literature search strategy
The methodology utilised to find the relevant literature for review is displayed in Fig. 2. The main tool used for identifying relevant literature has been Scopus 1 search engine, which is the largest abstract and citation database of peer-reviewed literature. 2 The queries used in the search engine are the following: � "Artificial Intelligence" AND "Demand Response" � "Machine Learning" AND "Demand Response" � "Neural Networks" AND "Demand Response" All the results returned from the Scopus' queries have been carefully reviewed and filtered. The work included in this review are the papers where AI approaches have explicitly been used for demand-side response applications and are not just part of the wider energy domain.

Structure of the review
The remainder of this paper is structured as follows. First, Section 2 provides the fundamental background for our review, by introducing DR and its relationship to the electricity grid and energy markets. The subsequent two sections show the classifications of the reviewed literature, along with providing basic AI concepts and an initial discussion. 1 https://www.scopus.com/home.uri. 2 By contrast to Scopus, other well-known scientific databases such as ISI Web of Science cover mostly journals, and provide less coverage of conference proceedings and other dissemination venues popular in AI/ML area, while other databases such as IEEExplore and ACM Digital Library cover mostly publisherspecific sources.
In section 3 the reviewed papers are categorised based on the type of AI algorithm(s) that is utilised, while in section 4 these papers are classified based on the DR application area of the AI techniques. Next, section 5 presents an overview of some of the key commercial use cases and industrially funded research projects, where AI approaches have been employed to perform DR. section 6 outlines which groups of AI techniques have been applied for each DR application area and focuses on the discussion of the strengths, limitations, and the potential implications of using these specific AI approaches for the respective DR application areas. Moreover, the main findings of the study are discussed, along with a presentation of potential directions for future research. Finally, section 7 concludes this review paper.

Demand response operation and market structure
The traditional model of the electric grid feeds electricity to the end consumers through a unidirectional power flow. This flow is supplied by high voltage generators, which are centrally controlled. With the development of markets for grid services and the growing proportion of DER in the energy mix, demand side management and especially demand response have emerged as smart solutions to reliably and efficiently manage the electric grid. However, in contrast to traditional power grids, a DR model requires a bidirectional communication mechanism and smart algorithms to process the generated data. Consequently, smart metering devices are really important for DR models, and they are one of the key components in a smart grid [24]. Additionally, the data produced can be utilised by AI-based solutions to further facilitate DR programmes.
The focus of this section is to introduce and present DR services, as well as describe how they fit in the current electricity market structure.

Demand response
Energy demand response in broad terms can be considered as one of the mechanisms within demand side management [25] and possible with ongoing smart grid activities. In this paper, with the term Demand Response we are specifically referring to the changes in electricity usage by the end-use customers (industrial, commercial, or domestic). The customers commit to change their normal consumption patterns by temporarily using on-site standby generated energy, or reducing/shifting their electricity consumption away from periods with low generation capacity in response to a signal from a system operator, or a service provider (i.e. aggregator) [25]. We acknowledge that DR is a broader term (i.e. including thermal energy, gas, etc.), but the focus in this paper is on electrical power systems. There are numerous types of DR programmes, and their most frequently used classification is based on which party initiates the demand reduction [8]. As displayed in Fig. 3, DR schemes can be partitioned into two classes [9,26,27].  � Price-based DR programmes. In this setting the price of electricity changes over different time periods, with the purpose of motivating end-use consumers to vary their energy consumption patterns. Schemes that fall under this category are time of use (ToU), criticalpeak price (CPP), and real-time price (RTP) [10]. � Incentive-based (or Contract-based) DR schemes. This type of schemes incentivises end-use consumers to reduce their electricity consumption upon request offers or according to a contractual agreement. Examples of this kind of programmes are direct-load controls (DLCs), interruptible tariffs, and demand-bidding programmes [28].
Each of these control strategies require to design the incentives or contracts that are proposed to the consumers, while taking into account the consumers' behaviours and preferences. To achieve this goal, DR solutions extensively use AI-based solutions, as is shown in Section 3.
In the next subsection, the main principles of electricity markets are described, and we explain how DR is used as a key tool to maintain the integrity of electricity grids.

Electricity markets and their relationship with demand response
Electricity markets are split between retail markets, in which electricity retailers contract the supply of electricity with the end-users, and wholesale markets, in which retailers, suppliers, producers, grid operators and third parties as aggregators interact to allow retailers to supply their customers while maintaining the integrity of the grid. The wholesale electricity market is split into the energy market, the capacity market, and the ancillary services market, all of which are designed to provide economic incentives to different stakeholders to contribute to the energy supply and to the grid operation and integrity. Demand-side response is associated with the energy and ancillary services markets. Depending on the country, contracts between the market stakeholders can be done through bilateral trades (over the counter (OTC)) or through an organized market (exchanges, pool auction with price clearing). In both cases, the products can be traded in the spot market (day ahead and/or intra-day), or in the TSO's managed spot market for ancillary services markets.
Once a resource supplier commits to provide a certain amount of energy into the grid, compliance is expected; otherwise there is a penalty incurred. Thus, it is of great importance for DR aggregators to make sure that end-users commit and provide the power flexibility. Below, we briefly describe the different stakeholders that interact through electricity markets and are related to DR mechanisms.

Electricity markets stakeholders
The main stakeholders in an electricity market are the following: � Grid operators: the Transmission System Operator (TSO) is a facilitator of the markets who ensures that every trade meets the grid constraints. Also, TSOs are usually operating ancillary services markets. TSOs and Distribution System Operators (DSOs) can buy or sell products in all markets. � Retailers and suppliers: they participate both in the retail and wholesale market, and they make sure the quantity of energy purchased on the wholesale market will balance the consumption of their end-users in their portfolio. To achieve this balance, they can either have a sub-contract with balance responsible parties (BRPs) or manage their portfolio themselves. They can propose particular contracts to the end-customers as flat tariffs or DR programmes. When proposing DR programmes, the challenge for suppliers is to assess how these programmes will affect their portfolio's consumption. Therefore, AI based tools reviewed in this paper are important for suppliers to provide solutions to reduce their losses due to portfolio imbalance.
� End-customers, who buy electricity from a supplier. When they subscribe to a DR program, they can either respond manually to a request or a price, or automatically through a home energy management system. AI methods reviewed in this paper also address the challenges faced by end-customers' HEMS. � Balance Responsible Parties: they are responsible for balancing the portfolio of their customers (retailers/suppliers). They purchase electricity production or consumption in the wholesale market. � Producers: they produce electricity and propose their production at a particular price on the wholesale markets. Their products can either be only energy and/or grid services as frequency response. � Aggregators and service providers: they aggregate end-customers or small producers in order to reach the minimum capacity allowed to provide flexibility products in the energy and ancillary services market. Hence, they have direct contracts with end-customers, and offer their aggregated flexibility to suppliers or BRPs in the wholesale market. As for the retailers/suppliers, they must ensure that endcustomers will commit to the flexibility that was traded in the wholesale market. Hence AI tools reviewed in this paper apply particularly to aggregators in order to minimise the difference between the traded and actual flexibility.
The end of this subsection describes the different markets listed previously and specify how DR products can participate to these markets.

Capacity markets
In these long-term markets, the regulators ensure that the production capacity for the following years will meet the evolution of the demand. DR products are rarely exchanged within these markets.

Energy markets
These are the main markets that allow retailers to buy electricity from electricity producers. In these markets, retailers or suppliers are usually required to maintain a balanced portfolio at every market time interval, with as much electricity consumption as electricity production, in order to maintain the frequency of the grid at its nominal level. DR is a particular product exchanged in this market in order to allow suppliers to adjust their demand and maintain balance at every time interval.

Ancillary services markets
Electricity can be considered as a product carried by the electric grid that must satisfy contractual characteristics and requirements. The electric grid operator is responsible to make sure these requirements are met, in exchange for remuneration. Electric grid regulation can be summarised as the control of the grid frequency, of the voltage at each node of the grid, of the power quality (harmonics, flickers, etc.), and also the control of downtime minutes per customer per year. To ensure that these controls are well provided, the System Operator makes sure that a portion of producers and consumers contribute to these services, either by providing market-based incentives, or by setting up mandatory requirements.
These services are called ancillary services. Specific ancillary services markets can be distinguished depending on the type of product that is required. For example, the Australian Energy Market Operator currently facilitates eight separate markets that can be classified into frequency control ancillary services markets, network control ancillary services markets or into the system restart ancillary services markets category [29]. In most countries, to contribute to the ancillary market managed by the TSO, it is first necessary that the resource (a generator, a battery or load) is certified by the system operator [30]. Demand response can mainly contribute to two of these services, which are frequency control, at a nation scale, and voltage control, at a local level. Indeed, although DR is mostly associated with the frequency control in current practice, it could also provide local voltage support as it involves assets that are potentially available at every node of the grid.
1. Frequency Control. For the effective operation of the power grid, system operators (SOs) are required to control the power system frequency between a range of specific acceptable values.
In the majority of the cases, this range has a central value of either 50Hz or 60Hz, depending on the national power system. In order to maintain the system frequency between the acceptable boundaries, the active power generated and/or consumed needs to be controlled to keep demand and supply balanced at all times. When demand is higher than generation, the system frequency decreases, and vice versa. This type of control is achieved by keeping a particular volume of active power as reserve, usually called frequency control reserve [31]. In general, based on the Continental European synchronous area 3 framework [32] (former UCTE), as can be seen in Fig. 4, there are three levels of control used to balance the demand and supply [31]: (a) Frequency Containment Reserve (FCR), also called primary frequency control is a local automatic control that changes the active power production and the consumption of controllable loads to restore the balance between power supply and demand [33], with a maximum activation time of 30 s. This level was introduced to control the frequency in the event of large generation or load outages. Both the supply and demand side participate in this control with the use of self-regulating equipment. For comparison purposes, in the US market this frequency response corresponds to the Regulation response provided by the automatic governor of the turbines and the automatic generation control [34,35]. Most of the FCR is currently provided by gas turbines, hydro power plants, and storage as batteries or flywheels. However, these technologies also have a negative impact on the environment [36]. In many cases, DR solutions are both the most cost-effective and environmentally friendly technology to provide this service, if well-coordinated. (b) Automatic Frequency Restoration Reserve (aFRR), also called secondary frequency control is a centralised automatic control that fine-tunes the active power production of the generating assets to reinstate the frequency and the interchanges with other systems to their target range after an imbalance event. Secondary frequency control is used in all large interconnected systems and the activation time generally ranges between 30 s and 15 min (depending on the specific requirements of the interconnected system). This regulation is provided in the US by the Spinning Reserve and Regulation response. (c) Manual Frequency Restoration Reserve (mFRR) and Replacement Reserve (RR), also called tertiary reserve involves the manual changes in the dispatching and commitment of generating units. This reserve can be used to replace secondary reserve when the secondary reserve is not enough to regulate the frequency back to its nominal value. mFRR response can be below 15 min, whereas RR activation time varies from 15 min up to hours. The purpose of this type of control includes the recovery of the primary and secondary frequency control reserves, the management of congestions in the transmission network, and the restoration of frequency to its intended value when secondary control has not been successful. In the US market, the tertiary response corresponds to Non-Spinning Reserve and Replacement Reserve.
Different countries have different power systems, resulting to different implementations, and also diverse descriptions for the reserves related with each type of frequency control [31]. For example, in the UK the SO (National Grid) has an obligation to control the system frequency at 50 Hz �0:4%(49.8 Hz -50.2 Hz) for operational limits [37]. Moreover, in the UK and Sweden there is no reserve defined for secondary frequency control, and there is a division of the primary frequency control reserves in various categories.
Providing frequency control becomes more challenging due to the higher penetration of intermittent renewable energy sources in the power generation mix, resulting in lower inertia in the system [38,39], and the introduction of new types of loads with higher variability (e.g. EVs) [40]. This fact calls for research and use of novel techniques and flexibility, including DR at the end-user level, which requires AI-based solutions to increase this flexibility provision.
2. Voltage Control. Along with frequency, voltage is a contractual characteristic which the system operators must ensure to confine within certain bounds set by the regulator. However, unlike frequency, which is mainly addressed at the transmission grid level, voltage is a challenge faced by TSOs and DSOs. The voltage drop across a line of an impedance z ¼ r þ jx is due to the consumption or production of an apparent power S ¼ P þ jQ, and is given in Equation (1) below: where V is the average between the voltage at both ends of the line. Hence, a bus's voltage fluctuates continuously depending on the power that flows through the lines that are connected to this bus. Voltage control consists in the action of different mechanisms that ensure that the voltage stays within contractual boundaries at every bus of the grid. According to (1), the transmission grid and the distribution grid must be differentiated. Indeed, for the transmission grid, the resistance of the lines is small compared to their reactance. Thus, the voltage drop is mainly due to the transit of reactive power. Hence, voltage control at the TSOs level is mainly realised by injection or consumption of reactive power. This control can be done by generators, synchronous condensers, capacitors or flexible AC Transmission Systems [41]. On the distribution grid side on the other hand, active power and reactive power are both responsible for voltage drops. Hence, the growing proportion of DERs creates challenges in voltage profile management. While previously voltage was decreasing closer to the loads, the high penetration of DERs, such as rooftop solar panels, can increase the voltage locally by producing variable active power. To control the voltage at the distribution level, DSOs currently use tap changer mechanisms in transformers. However, although primary substations, which connect distribution grids to transmission grids, are often equipped with online tap changer mechanisms, secondary substations are usually only equipped with de-energized tap changers, which require the disconnection of the feeding line before adjusting the voltage. Given the volatility of distributed generation and EVs charging power at the low voltage level, new services are needed at the distribution grid side to maintain the voltage, within acceptable limits, while minimising load and generation curtailment. This is where smart solutions for residential demand-side response could prove to be very useful in practice and would make it possible for the DSO to integrate more DER and EVs in the system, without costly grid reinforcements [42]. To our knowledge, no spot market has yet been implemented for voltage control at the transmission grid level, because of the need for very local solutions (mostly reactive power injection) [43]. In some countries (including most of the countries in the European Union), the MVAR service (Voltage and Reactive power control) is a mandatory service that can be contracted through bilateral or tendering trades and settled at regulated prices. At the distribution grid level, many local markets are currently under test to provide local support to the grid, including voltage support [44]. Even though open markets for contracting voltage support are not as well-structured and adopted as those for frequency response, contracting such services will likely still be needed in the next decades.
The evolution of power systems, driven by an increasing penetration of DER, calls for new solutions to address the technical challenges of a smart grid (mainly frequency and local voltage regulation). DR is one of these solutions. The installation of smart meters and the increasing adoption of IoT devices at home lay the foundations for smart DR strategies. On top of that, these strategies will rely on the implementation of smart algorithms based on AI solutions to achieve an efficient regulation of the demand without severally affecting the end-users comfort.
In the subsequent section (Section 3) we provide a comprehensive review and discussion of the AI solutions that have been proposed and investigated so far by the research community for automating DR. Next, Section 4 reviews and provides a discussion of specific DR services and areas where AI/ML techniques have been applied.

AI approaches/techniques in demand response
AI is a multidisciplinary domain employing techniques and insights from various fields, such as computer science, neuroscience, economics, information theory, statistics, psychology, control theory and optimisation. The term artificial intelligence is referring to the study and design of intelligent entities (agents) [45]. These intelligent agents are systems that observe their environments and act towards achieving goals. In this work, the adopted definition of an agent is the one presented in the seminal AI work of Russell and Norvig [45].
"An agent is anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators" Hence AI-enabled agents can range from machines truly capable of reasoning to search algorithms used to play board games. Since the birth of AI in the 1950s various approaches have been applied to create thinking machines. These approaches include symbolic reasoning [46], logic-based [47], knowledge-based systems [48], soft computing [49], and statistical learning [50,51]. The focus of this paper is on the non-symbolic, soft computing, data-driven paradigm of AI. Furthermore, to present a more holistic view in this review the AI approaches are studied both in the single-agent and the multi-agent setting. The various AI techniques used for DR and their classification can be seen in Fig. 5, whereas Fig. 6 displays the proportion of the reviewed literature that has utilised a particular category of AI techniques.

Machine learning and statistical methods
As we enter the big data and the IoT era, there is a great need for automated analysis of the "data tsunami" that is being continuously created. Machine learning includes a set of methods that try to learn from data, and it is a core subset of AI. This group of AI techniques envelopes methods that can identify patterns in the data in an automatic way, and then use these patterns to predict, and techniques to perform other ways of decision making in an uncertain environment [52]. Machine learning is a multi-disciplinary domain that draws concepts from various domains, primarily computer science, statistics, mathematics, and engineering. The main types of machine learning, as stated in Murphy [52], are supervised learning, unsupervised learning, and reinforcement learning.

Supervised learning
In the supervised learning setting, the goal is to learn a mapping between the input vector x and the outputs y, provided that there is an existing labelled set of input-output pairs D ¼ fðx i ; y i Þg K i¼1 . This set of data is called training set, and the inputs x i can be something as simple as a real number to a complex structured object (e.g. an image, a timeseries, a graph, etc.). The outputs y i in general can be of any type; the two most common cases are when y i is a categorical variable in which case we have a classification problem, and when it is a real-valued scalar variable where we have a regression problem.
Supervised learning tries to tackle an inductive problem, as from a finite set D we need to find a function f which will give an output for the whole spectrum of possible inputs. In simpler terms, the end goal is to find a mapping that will generalise well in data that the algorithm has not encountered before. The set of unseen data which is used to calculate how well the algorithm generalises is called test set and should not include datapoints which are part of the training set. In cases where a more/less flexible approach than the optimal is used, resulting in a learning algorithm that does not generalise well in unseen data, we say that the algorithm overfits/underfits the data.
In DR, supervised learning techniques have been primarily applied to forecast the demand and electricity prices, by employing kernel-based, tree-based methods, and linear regression models. ANNstrained in a supervised fashionare also extensively used for forecasting but will be discussed in their respective section because they are heavily utilised in research. Kernel-based methods create representations of the input data to a new feature space, and subsequently find an appropriate hypothesis in this feature space [53]; popular kernel-based techniques include support vector machines (SVM) and Gaussian processes (GPs). Support Vector Regression (SVR) has been used in Giovanelli et al. [54], Pal and kumar [55] for price forecasting, whereas Yang et al. [56] Zhou et al. [57,58] employ SVR for STLF, even for non-aggregated loads. The regression is obtained by solving the dual form of an optimisation problem as defined in Durcker et al. [59], and using Equation (2) to determine the regression function pðxÞ, with β and β * the Lagrange multipliers, x the inputs for the forecast, b a primal variable and Kðx i ; x j Þ a Kernel function, often chosen as the Gaussian Radial Basis Kernel Gaussian process regression models have been used to determine a probabilistic baseline estimation in Rajgopal [60] and Weng et al. [61], as well as for forecasting the consumption of controllable appliances in Tang et al. [62]. GPs have the advantage of being probabilistic models. Probabilistic approaches can potentially lead to better informed forecasts for DR; they output an estimate of the uncertainty in the predictions of the modelnot just point estimates. Thus, prior knowledge can be included in the learning algorithm and subsequently domain knowledge can be incorporated in the model [52].
Unlike kernel-based methods, linear regression is a simple tool, easily implementable, which offers a good interpretability of how the inputs affect the output [52]. These attributes explain why it has been used across various domains. Even though linear regressions are often employed as a baseline algorithm by researchers to compare their proposed algorithms [54,56,58,63,64], they have also been used as the main modelling technique in MacDougall et al. [65] to forecast the flexibility of Virtual Power Plants (VPP), in Dehghanpour et al. [66] to determine the aggregated power of price-sensitive loads at each hour of the day, in Klaassen et al. [67] to forecast the aggregated power for heating, as a function of the temperature, the time of the day, the type of day and the price, and in Grabner et al. [68] where multivariate linear regression is used for daily peak loads estimation. In the DR space, for STLF, the output of the regression is the power of the considered load (thermal load or building) at time t (P t 2 ½0;24� ), while the inputs or features can be time-related (hour, day, type of day), the temperature (T), the electricity price p, and/or a product of these inputs (e.g. hour � T, hour � p) to reflect the interactions between the inputs. A generic formulation can be found in Equation (3) where P j is the power consumption forecast for time t ¼ j. The vectors ½I ij � represent the inputs or features, where i corresponds to one type of feature (temperature, type of day, …). For example, if there are three features considered, the temperature (i ¼ T), the type of day (i ¼ d) and the product of the price and the hour (i ¼ p⋅h), I T11 is the temperature at time t ¼ 11, I d11 is the type of day, and I p⋅h11 is the product of the price at t ¼ 11. The terms P 0j correspond to baseline consumption or offsets, and the a i k;l coefficients are the linear coefficients for each of the features, that are computed and updated to minimise an error function (residual sum of squares for example). Moreover, there are also a few papers that have utilised Gaussian Copulas, primarily for load forecasting, in DR. Tavakoli Bina and Ahmadi [69] applied this technique for the prediction of EVs' charging demand for day-ahead DR strategies, Bina and Ahmadi [70] for the day-ahead estimation of the aggregate power demand of particular household appliances, and Bina and Ahmadi [71] for non-controllable load forecasting in day-ahead DR. Besides DR, there is also work applying Gaussian copulas in the, more general, power system setting. E. g. in PV power forecasting [72,73], in short-term wind power forecasting [74], and in the forecasting of inflexible loads [73].
Tree-based methods have also been used extensively in DR for load forecasting [56][57][58]75,76] and price forecasting [54], where Giovanelli et al. [54] include a comparison with other methods (Linear regression, SVR, Gradient boosting decision tree). Yeng et al. [56] use regression trees to model the energy consumption of cooling systems and compares the outputs with SVR methods. Behl et al. [76] use multiple regression trees to predict the power consumption of a building as a function of the temperature, humidity, wind, time of day, type of day, schedule, lighting level, water temperature and historic power consumption. Zhou et al. [57,58] use classification and regression tree (CART) algorithms for short term load forecasting. Regression trees are hierarchical, non-parametric methods that segment the feature space (e.g. time of day, type of day, temperature) into a number of simple regions and subsequently fit a simple model in each one [77]. Regression trees are interpretable, scalable, handle missing data well, but on the other hand they can be prone to overfitting, and are generally unstable [52,54]. However, regression trees have been reported to provide accurate results even for complex prediction tasks, such as 48 h ahead predictions of aggregated demand with time step of 15 min [75].
Another commonly used approach, found in the review, for load forecasting in DR is ensemble learning. Ensemble learning is based on the idea of constructing a prediction model through a combination of multiple simpler base models (weak learners). Cheung et al. [78] propose a variation of ensemble learning called Temporal Ensemble Learning (TEL) that partitions the dataset by temporal features and forecasts demand in specific time ranges per day. The ensemble of these generated forecasts, with kernel regression as the base model, is the model that yielded the best results in this paper. Yang et al. [56] apply methods based on voting or stacking strategy to combine weak learners based on regression trees and SVM, to estimate the energy consumption of buildings that have an energy management system (EMS) which responds to DR signals, and Giovanelli et al. [54] use Gradient boosting decision tree (GBDT) to build an additive regression model, with the use of regression trees as the weak learner. This model is used to predict the prices in the FCR market. Ensemble learning approaches can potentially improve the forecasting accuracy (compared to the base models); such as in work of Cheung et al. [78], where ensemble learning techniques achieved higher accuracy for short term (1-h) load forecasts compared to the linear regression and SVR approaches.
Besides forecasting, supervised learning models have been utilised for other tasks in DR as well. Goubko et al. [79] apply a Bayesian learning framework to estimate the consumer's comfort level function, Liyan Jia et al. [80] an online learning algorithm (called piecewise linear stochastic approximation) to solve the task of day-ahead dynamic pricing for an electricity retailer, and Shoji et al. [81] adapt a Bayesian network to an EMS, with the purpose of learning the residents' behaviour and controlling the appliancesunder varying electricity prices. Moreover, Albert and Rajagopal [82] employ AdaBoost [83] to ensemble learn (binary) classifiers using features generated using spectral clustering. These classifiers are used to predict certain DR user characteristics.

Unsupervised learning
In this case of unsupervised learning methods only the inputs are where K is the number of datapoints, and the system attempts to detect patterns in the data that could be of interest. Compared to supervised learning, this is not such a well-defined problem due to the fact that the patterns needed to be detected are not known beforehand, and because there is a lack of obvious error metrics to be used. On the other hand, it can be applied to a wider spectrum of cases as it does not require labelled data, which can be difficult or expensive to acquire. In DR this is advantageous due to the lack of labelled data. The usual examples of unsupervised learning are clustering the data into groups, dimensionality reduction by discovering latent factors, learning graph structure, and matrix completion.
In DR the dominant use of unsupervised algorithms has been for clustering purposes; where you create groups of objects (e.g. load profiles) in a way that objects within the same cluster are similar to one another, and dissimilar to the objects in other clusters. The various clustering algorithms have been applied to segment the consumers and find typical shapes of load profiles. In turn, this grouping can be used (among others) to identify potential households for DR schemes, select consumers for DR events, and compensate consumers for participation in DR programmes.
Clustering algorithms can be classified in hard and soft; in hard clustering each item can belong only in one cluster, whereas in soft clustering each item can belong to multiple clusters. The K-means algorithm has been employed in the majority of cases and with various distance metrics [57,68,[84][85][86][87][88][89]. K-means clustering is a distance-based method with the purpose of predicting K centroids (points which are the centre of a cluster) and a label c ðiÞ for each data point in the dataset. A data point is considered to belong in the k th cluster if the distance between the vector and the k th centroid is the smallest among all centroids. K-means finds the best centroids by iteratively alternating between (1) assigning data points to clusters based on the current estimate of centroids, and (2) choosing centroids based on the current assignment of data points to clusters, until the assignments do not change [52]. In DR, K-means is mainly used to group individual households based on monitored load data, which are usually grouped by weekdays and averaged over a period of several weeks. The features used for clustering can include the important components from a Principal Component Analysis (PCA) [84,87,90] (or Self-Organising Maps (SOM) [89]), the daily load shapes directlyin which case the dimension of the considered space will be the size of the load profiles (e.g. 24 for hourly intervals monitoring) - [57,91], and/or particular characteristics from the households, such as the average and peak daily consumption [85], and pricing information [86,88]. Cao et al. [84] compares the clustering of 4000 households over 18 months from the Irish CER dataset, using K-means, SOM, and hierarchical clustering methods with different distance computations based on the 17 most significant PCA components. Koolen et al. [87] aim to cluster households into two groups (k ¼ 2), one more suited for Time of Use tariffs, and one more suited for Real Time Pricing. They use spectral relaxation clustering with PCA to find 9 eigenvectors that define the space for the k-mean clustering. Finally, Grabner et al. [68] use k-means for substations' load profile clustering, with dynamic time warping algorithm to measure the distance between time series (instead of Euclidean distance).
Two major challenges in unsupervised clustering cluster analysis are the estimation of the optimal number of clusters [98], and the validation of clustering structures [52]. In the DR literature the selected number of clusters is between 2 and 16, and the selection approaches include indices (e.g. Bayesian information criterion (BIC) [85], Dunn index (DI) [89], Davies-Bouldin index (DBI) [89,97], mean silhouette index (MSI) [89]), exploratory techniques [87,92,94], methods based on matrix perturbation theory [97], and iterative methods which increase the number of clusters K and perform a criteria-based comparison (depending on the application) for each K, while making sure to avoid over-fitting [62,84,86,87,91]. Indeed Tang et al. [62,91], Kwac et al. [91] use an adaptive k-means approach to find the best number of clusters of households, and Cao et al. [84] limits the number of clusters to 14 in order to avoid over-fitting. Moreover, bootstrapping techniques have been used to check the reliability of the clusters and test the results' robustness [95]. Waczowicz et al. [99] propose an automatic framework, based on a ranking method, to compare and select the hyper-parameter values for DR clustering purposes.
Besides consumers' segmentation, unsupervised techniques have also been utilised to detect the presence of heating appliances in a household [100], infer the dynamic elasticities curves [92], and detect the occupancy of a household [58,82] where Albert and Rajagopal [82] use spectral clustering to cluster a collection of HMMs into classes of similar statistical properties. This information can be very valuable to aggregators, so that they can assess the flexibility of their assets.

Reinforcement learning
Learning from interaction is a fundamental idea in almost every learning paradigm. One of the most interesting computational approaches to learning from interaction is Reinforcement Learning (RL). RL is an approach which explicitly considers the whole problem of an agent focused on goal-oriented learning while interacting with an uncertain environment [101]. It is a distinct paradigm from supervised and unsupervised learning which considers the trade-off between exploration and exploitation. Trial-and-error type of search as well as delayed reward are the two most characteristic aspects of RL. The problem of RL is formalised using the concept of Markov Decision Processes (MDPs). In MDPs at each sequential, discrete time step t the agent receives a representation of the environment's state (S t 2 S ), selects an action (A t 2 A ðsÞ) based on the state, and finds itself in the state of the subsequent time step S tþ1 where it receives a numerical reward (R t 2 R ⊂ R)because of its action A t [101].
The RL framework has been applied to a number of domains with the most important being robotics [102,103], resource management in computer clusters [104], playing video games from pixel input [105], automated ML frameworks [106]. In the DR field, RL has been widely applied to the tasks of scheduling and control of the various units (e.g. domestic appliances, EVs), while taking into account consumers' preferences (via interaction with them). RL has been presented as a data-driven alternative to model-based controllers for DR, both at the consumer level (as part of an EMS), and at the service provider level. There is also research where RL framework has been used to learn the DR pricing mechanism for service providers [107][108][109] and develop a demand elasticity model for an aggregation of consumers [110].
The online nature 4 of various RL methods makes it appealing for DR due to the low volume of many existing DR-related data sets. Accordingly, it has been heavily applied and various solution methods of the RL framework have been used. The solution methods to RL can be arranged in two different classes; tabular methods where the spaces of possible states and actions are limited enough to allow value functions to be represented as tables, and approximate methods which can be applied to problems with arbitrarily large state spaces [101].
In DR, the most common tabular method applied is Q-learning [66,[107][108][109][110][111][112]. Q-learning [113] is a temporal-difference, model-free 5 RL technique which directly approximates the optimal action-value function, independently of the policy 6 being followed [101]. In this case the learned expected discounted reward QðS t ; A t Þ that the agent receives for executing action A t at state S t and following policy π thereafter (action-value function) is defined as follows: where α 2 ½0; 1� is the learning rate, γ is the discount factor, a 2 A , R t the actual reward obtained for getting from state S t to S tþ1 , and max a Qðs tþ1 ; aÞ is the maximum reward the agent can expect from being in state S tþ1 . Using Equation (4), the agent updates his table of expected rewards (for state-action pairs), which will then allow him to find the optimal action at future times. In DR applications, Q-learning has been used to help the service provider company (aggregator) provide the optimal sequence of retail electricity prices to consumers [107][108][109]. In this case, the agent is the aggregator, the action is the price incentives sequence that is proposed to the customers, while the state corresponds to the energy demand from the customers, and the reward is a function of the aggregator's profit and the cost incurred to the customers. Q-learning is also frequently used at the HEMS level to optimise the scheduling of appliances by considering the cost and comfort for the users as a reward function [111,112]. In O'Neill et al. [112], the authors consider pre-specified disutility functions for the customers' dissatisfaction on job scheduling, but Wen et al. [111] address this limitation. Under this context, a state is composed of a price sequence from the retailer or aggregator, a vector that reflects the user's consumption of specific appliances over time, and sometimes the priority of the considered device. The action from the HEMS is to switch on or turn off the considered devices at time t, and the reward is computed based on the satisfaction (or dissatisfaction) of the customersquantified usually by the time delay in the actual switching of an appliance, or by directly modelling the end-user's discomfort function [66,114]. Further, tabular methods are employed in Jain et al. [115] as a multi-armed bandit mechanism which involves learning to act in only one situation (single state), as well as in Ahmed and Bouffard [116] where the problem is formulated as a bandit problem and they apply Monte Carlo methods to learn the value of actions for a given policy.
In contrast to tabular methods, the approximate RL methods used for DR are not online algorithms, but batch or mini-batch methods. In online algorithms the input data are obtained sequentially while the learning algorithm executes, whereas in batch algorithms the entire dataset used for learning is readily available [101]. Ruelens et al. [117,118] Claessens et al. [119], Patyn et al. [120] use Fitted Q-iteration (FQI) at the end-user level (HEMS) to allow the HEMS to determine an optimal control sequence (policy) of thermal appliances for each time step of the 4 Learning happens at each time step, as data becomes available in a sequential order. 5 There is no need for a model of the environment. 6 It is defined as the mapping from states to probabilities of selecting each possible action. It shows the learning agent's way of behaving at a given time.
day based on day-ahead pricing signals. The aim for the HEMS is to minimise the daily cost of electricity demand. FQI algorithm estimates the state-action value function (expected reward QðS t ;A t Þ) offline, using a batch of historical data, and approximates it using either linear regression or ANNs. A further use case of FQI at the HEMS level is the construction of an optimal day-ahead load profile, which is subsequently sold in the market. The objective in this use case is to increase consumers' profit and minimise the deviation between the day-ahead load profile proposed in the market and the actual load profile. Medved et al. [121] propose another variant of the Q-learning algorithm, where action-value functions are parametrised using an ANN, called deep Q-learning [122], whereas Bahrami et al. [123] use an actor-critic online learning method [101].
In addition to the aforementioned centralised, single-agent methods, other multi-agent extensions of these have been reported in the literature. These alternative learning methods are mainly employed to address the limitations of centralised approaches in terms of computational power neededby distributing the workload among the participating agents -, scalability, reliability of the system, as well as the data privacy of consumers. Hurtado et al. [114] propose a decentralised and cooperative RL method which extends the Q-learning algorithm to the multi-agent setting by incorporating the optimal policies and the actions of the other agents. Cooperation between agents has been considered in Golpayegani et al. [124] too, through the use of a collaborative and parallel MCTS, where it is used to enable EVs to actively influence the planning process and resolve their conflicts via negotiation in a DR scenario; MCTS can be considered as a form of RL algorithm [125]. On the other hand, there are papers that address the problem without collaboration among the agents, and a decentralised Q-learning is used [126], as well as W-learning [127]. Multi-agent approaches diminish the need for complex, computationally intensive algorithms compared to centralised methods, in exchange for increased collaboration and communication overhead among the agents. For a more detailed search of RL methods in DR the reader can refer to the work of V� azquez-Canteli and Nagy [17].

Nature-inspired algorithmics
Natural and biological systems have always been a key source where scientists draw inspiration from, to design novel computational approaches. In the context of AI, nature-inspired algorithms have been utilised for searching and planning purposes, i.e. to find the sequences of actions needed to reach an agent's goals [45]. The nature-inspired algorithms found in the DR literature are often meta-heuristics motivated from evolution, biological swarms, or physical processes. The term meta-heuristics refers to the class of stochastic algorithms with randomization and local search, and is used to denote the set of iterative processes which augment heuristic procedures, via intelligent learning strategies for the exploration and exploitation of the search space, with the goal to efficiently discover near-optimal solutions [128].
In DR, nature-inspired algorithms have been primarily used to schedule loads or appliances at the consumer level (algorithm embedded in HEMS) or help aggregators and retailers to optimise the pricing of their customers who offer DR services. Since meta-heuristics are able to find solutions in a reasonable timeframe, they have been heavily utilised under the DR context, where the scheduling task can be computationally expensive.

Evolutionary algorithms
Evolutionary algorithms, or Evolutionary Computation (EC), is a heuristic-based approach which uses methods inspired by biological evolution, by mirroring computationally some of its core principles, such as reproduction, mutation, recombination, and selection. The architecture of an EC algorithm includes three main steps. The first step is the initialisation step, where a set of possible solutions is chosenmost of the time randomly. The second step is the evolutionary iterations with two operational steps, namely, fitness evaluation and selection and population reproduction and variation. The fitness evaluation consists in evaluating the objective functions obtained for all the individuals of the initialisation population, while the selection criteria allow to select the individuals that performed best in order to determine a new population using reproduction (crossover, replacement) and variation (mutation) methods. Then, this new population is re-evaluated, and a new iteration is realised until the evaluation of the optimisation function on an individual meets a termination criteria. Evolutionary learning algorithms are a family of algorithms which include genetic algorithms (GA), evolutionary programming (EP), evolutionary strategies (ES), genetic programming (GP), learning classifier systems (LCS), differential evolution (DE), and estimation of distribution algorithm (EDA) [129]. Strengths of evolutionary algorithms are the fact that no gradient information is needed, they can be implemented in a parallel manner, and are highly exploratory. Compared to traditional optimisation/search approaches, this enables evolutionary computation to be used for optimisation and search in problem domains where the structure cannot be well characterised in advance (e.g. optimising an unknown function that describes a user's utility for energy consumption, or predicting future power market prices). On the other hand, evolutionary methods have inherent drawbacks in convergence, interpretability, can have unpredictable results, and there is no guarantee of finding the optimal solutions [130]. Because of their advantages, EC algorithms been used in a variety of fields [131,132].
In the literature on energy DR the prevailing method from the evolutionary computation is genetic algorithms [130][131][132][133][134]; GA is a model which abstracts the biological evolution process as described in Charles Darwin's theory of natural selection [135]. At the HEMS level, GAs are used to find the optimal switching time of each appliances. In this case, a population's individual x i is constituted by a set of binary values (x i ¼ fx i1 ;x i2 ;…;x iJ g) stating if the corresponding appliance is on or off at the considered time j [136,137], as explained below. For retailers' or aggregators' price scheme optimisation, GA usually consider individuals that consist in a set of prices p i ¼ fp i1 ;p i2 ;…;p iJ g, with p ij the price for the j th period of the day [130,134,[138][139][140]. These prices are the first generated randomly within the constraints, and the sets that produce the best outputs will then generate a next generation of prices. The objective function used to compute the output is usually a cost or benefit function, which aims at maximising the aggregator's profit. Then, each approach has its own replacement, crossover and mutation methods between the different prices' sets of one population. Finally, GA have also been used to train a neural network [141], and find the optimal parameters of an SVR model [55].
Furthermore, variations of the GA have been used in the multiobjective setting by primarily utilising the Non-dominated Sorting Genetic Algorithm II (NSGA II) [142]. The NSGA-II is an evolutionary algorithm that employs an elitist strategy to discover Pareto-optimal solutions for multi-objective problems, while being efficient in handling various constraints [143]. In DR it has been widely applied in the multi-objective scheduling of loads [144][145][146][147][148].
Other evolutionary algorithms which have used in the DR setting are the population-based differential evolution algorithm [140], which can be though as an further extension to GA with explicit updating equations [130], a differential Evolutionary Algorithm (EA) for the multi-objective management of lithium-ion battery storage in a datacenter for DR [149], and a bi-level evolutionary algorithm (EA) to determine a retailer's optimal power pricing in the face of DR strategies of consumers trying to minimise their electricity expenses [134].

Swarm artificial intelligence
The term swarm intelligence refers to a subdomain of AI related to the intelligent behaviour of biological swarms and how simulating these biological behaviours can be used to solve various tasks [150]. Swarm Intelligence algorithms most commonly found in the literature are Particle Swarm Optimisation (PSO) algorithm [151], and Ant Colony Optimisation (ACO) [152]. The work of Kar [153], Chakraborty and Kar [154], and Lakshmaiah et al. [155] are reviews that provide extended information about these algorithms. Similarly to evolutionary methods, swarm AI methods suffer from slow convergence speed and the risk of getting stuck in local optima [128]. On the other hand, in swarm AI algorithms, all particles' histories contribute to the search, unlike in GA where "poor" particles are discarded [156]. Additionally, swarm AI methods have less parameters requiring prior tuning and adjustment and are usually subject to easier implementation.
In energy DR, swarm AI algorithms are mostly used at the aggregator or retailer level in order to find the optimal scheduling or pricing scheme to minimise a cost function. Indeed, in DR, the optimisation problems often consider a large number of variables, with quadratic optimisation functions and constraints from AC power flow computation that make the problem non-convex. In this context, heuristic optimisation can easily find a near-optimal solution in less time than other mathematical techniques. Among these heuristic optimisation techniques, PSO is the most widely used in DR. PSO is based on the natural social behaviour of animals associated with swarms (e.g. flock of birds, fish shoal), where each of the individuals constituting the swarm (called particles) searches for an objective (e.g. food) but also considers the findings of the other individuals in the swarm [150].
When PSO is used for scheduling the customers' consumption [86,[157][158][159] or VPP assets scheduling (including loads) [160][161][162][163][164], a particle p is defined as a matrix X p ¼ ½x ijp � N�J , where N is the number of loads (customers, appliances, or VPP's assets) and J is the number of time periods in the considered DR scenario. Each of the x ijp correspond to the state of the considered load i at time j. x ijp can be a binary variable to indicate if the load is on (1) or off (0) [158,159,162], or it can be the power of the load [160][161][162][163][164].
PSO is an iterative process, where a population (swarm) of particles is randomly determined during the first step of the iteration by choosing the x 0 ijp values randomly for each particle p (or x 0 ijp can be initialised using the result of the optimisation of a simplified problem using Mixed Integer Linear Programming [161]). In parallel with the choice of the swarm's particles initial position (x 0 ijp ), the aggregator determines the utility function he wants to minimise, which is often given by the cost: c ¼ P time j P assets i p j ⋅P i ⋅x ij in the case where x ij is a binary variable, but it could also be a multi-objective function that also integrates the Peak to Average Ratio [136,[157][158][159].
For the use case of loads scheduling in VPP, Pereira et al. [163] provides an optimal scheduling based on a multi-objective function that includes the cost for customers and the operational costs of the aggregator. Similarly, Pereira et al. [163], Faria et al. [164] optimise the assets schedule based on four demand response remuneration programs (that belong to incentive and price-based categories). In Faria et al. [164], PSO is used to minimise the operational costs of the VPP, while ensuring load balancing, meeting resources capacities and DR shifting constraints. In Pedrasa et al. [162], the authors include constraints on curtailment duration and aim at minimising the cost for the consumption of the group of interruptible loads. Finally, electric vehicles and Vehicle-to-Grid charging can also be addressed by PSO algorithms [161].
The customers' discomfort for reducing consumption during a DR event can also be included in this utility function, as proposed in Herath and Venayagamoorthy [159,162,165]. The aggregator also takes into account the constraints of all the loads, as the maximum and minimum power, but also the minimum and maximum time for the use. Then, the utility function is evaluated for each of the particles, in order to prepare the update of the position of each particle in the next iteration. Indeed, each particle will be brought closer to the particle that reached the best cost c k best reached by the particle p best k at iteration k, while also taking into account its best position. Unlike in GA, all the particles are kept and updated. At iteration k, the position of each particle p will be updated from x k ijp to x kþ1 ijp using the following equation (5): where v kþ1 ijp is defined as the velocity of particle p at iteration k þ 1 in the direction i; j, and is given by Equation (6).
where c 1 and c 2 are the cognitive and the social acceleration constants respectively, r 1 and r 2 are random numbers between 0 and 1, ω p is the inertia of the particle p, that can evolve through time [157], x best ijp is the position of particle p that gave the best (lowest) cost c in the previous positions it was in, and x k ijp best k is the position of the particle p best k that achieved the best cost in the swarm at iteration k. The initial velocity is usually defined as 0, and should stay within boundaries, which can be defined using price information, as proposed in Faria et al. [166]. Once all the particles have been updated and constrained, within the boundaries defined by the aggregator and the consumers preferences (limits of time of use for each load for example), the current best position of each particle x best ijp is updated iteratively until the termination criteria of the cost function c are met. In this case, the optimal scheduling of loads through the day is given by X p best ¼ ½x ijp best �. PSO can also be used to determine an optimal price scheme, in which case the particles' position can be defined by P p ¼ fp 1 ; …; p J g where p j is the price at time j. Some researchers also implement a Gaussian mutation in the parameters c 1 , c 2 and ω p in order to improve the exploration of the space [161,166].
Finally, for optimal scheduling of loads at the aggregator level, Margaret and Uma Rao [167] also use the Artificial Bee Colony (ABC) algorithm which imitates the food searching behaviour of honeybees. Similarly, at the HEMS's level, Kazemi et al. [137] propose a Gray Wolf Optimiser (GWO) to schedule the appliances based on the price from the retailer and on each appliances' needs. GWO algorithm draws inspiration from the social hierarchy and hunting behaviour of gray wolf packs [156].

Other nature-inspired meta-heuristics
In addition to the aforementioned algorithms, there have been found various nature-inspired meta-heuristics which cannot be classified in the existing groups. In Herath et al. [165] the CLONALG-based [168] Artificial Immune System (AIS) algorithm, derived from the processes found in biological immune systems [169], is used to determine the aggregators' pricing scheme. Developed on the annealing concept, 7 the simulated annealing method is employed for DR in Spinola et al. [86], and the Wind Driven Optimisation (WDO) algorithm [136], which is based on atmospheric motion, is used to determine an optimal scheduling of appliances at the household level.

Artificial neural networks
Artificial Neural Networks (ANNs) are computational models inspired by, albeit not identical to, biological nervous systems. ANNs have been developed since the early years of AI as connectionist models; models which are large networks of simple processing units, massively interconnected and running in parallel [170]. Although ANNs could fall under both categories of machine learning and nature-inspired AI approaches, we present them in this review as a distinct category since they are heavily utilised in DR applications.
The basic component of an ANN are the units (or nodes) which are connected by directed links; the strength of each link is determined by a numeric weight. Nodes can either be input nodes (get data inside the network), output nodes, or hidden nodes (modify the data en route from input to output). Each unit calculates linear combinations of its inputs, which is then passed to an activation/transfer function (e.g. sigmoid functions, ReLU) to derive the unit's output [45]. The properties of ANN are conditional to the network's topology (the way units are connected) and the attributes of units. The two main architectures of ANNs are the feedforward and the recurrent architecture [45]. In feedforward ANNs (FF-ANNs) the connections between units form a directed acyclic graph, whereas recurrent neural networks (RNNs) allow for feedback connections and thus form a directed cyclic graph. In FF-ANNs the nodes are usually structured in hierarchical groups of units called layers; in which case the units' inputs come only from units in the immediately preceding layer. There is no upper bound to the number of hidden layers an ANN can have.
ANNs have been utilised for classification, clustering, pattern recognition and prediction across a number of disciplines [171]. In DR, ANNs of various architectures and depth (number of layers) have been used primarily for forecasting applications. Most of DR applications use ANNs to forecast the future consumption of an asset (building, appliance, group of consumers), or the flexibility of a load, or the electricity price in a short term (from several minutes to one day ahead). Indeed, ANNs can successfully replace nonlinear regression tools for those applications.
The inputs to be included depend on the variable that is forecasted. For instance, in load forecasting most of the implementations use inputs as previous consumptions (at a short time before, and sometimes at the same time but for previous days), weather (mostly temperature), type of day (value between 1 and 7, or 0 for week days, 1 for week-ends), hour for the prediction, and sometimes the price. For a price forecast, inputs are mostly previous prices (for the same day and for previous days at the same time). Flexibility forecast on the other hand is a function of previous consumption, weather, and set points from the DR aggregator (current and previous set points). The output of these forecasts is generally the variable value at a future time (power consumption or price), but it can also be the main wavelet transform's coefficients of the considered variable [172,173].
Based on the extensive literature on the topic, two main classes have been identified: single hidden layer ANN and Deep Learning, as shown below.

Single hidden layer ANN
The most widely used class of ANNs in the DR domain is the single hidden layer, feedforward ANN. There are also cases where autoregressive feed-forward are built [174,175]. The only two papers found to be using RNNs is in Liu et al. [176], where an Elman neural network is employed, and in Lee and Moon [177] (non-linear Autoregressive with external inputs RNN). In the DR context, the vast majority of the literature has employed single hidden layer ANNs for load and price forecasting. Besides these tasks, single hidden layer ANNs have been used to classify customers based on their potential participation in a DR event [178], and as simple black-boxes to model complex functions, such as consumers' thermal discomfort [179] or consumers' ability to shift their consumption [180], that mostly depend on temperature, time, type of day and price. All the ANNs that fall under this group are using sigmoidal activation functions. Most of the papers use the logistic sigmoid function [65,178,[181][182][183][184][185][186][187][188], although there are papers using other variants; hyperbolic tangent [67,180,[189][190][191][192], bipolar sigmoid [179], and log-sigmoid [191].
The prevalent methods for training ANNs 8 have been found to be gradient-based algorithms. The plain back-propagation with gradient descent [193] has been used in some cases [173,178,187,188,192,194], but the majority of the literature trains ANNs using variations of this algorithm to deal with its limitations. There is work where they try to avoid overfitting by using Bayesian regularisation [66,93,174,191,195], momentum [196], early stopping [197], and cross-validation [175,198]. The Levenberg-Marquardt Algorithm is used for training in numerous papers [64,67,93,179,180,189,190,[198][199][200] to provide faster convergence than the plain backpropagation, in exchange for high memory usage. Further implementations include the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm [65,181,[181][182][183], resilient backpropagation (RPROP) [201], and kernel based extreme machine learning [202], used for higher convergence speed. Non gradient-based methods for training ANNs are PSO [203], and a combination of PSO and GA in Xie et al. [141].
Even though the global approximation theorem [204] states that a FF-ANN with a hidden layer is sufficient to learn any function, there is evidence that utilising models with more hidden layers (deep ANNs) can result in architectures with smaller number of units and lower generalisation error [205].

Deep learning
Deep learning is a branch of machine learning methods which involve learning multiple levels of representation and abstraction, and has the ability to process data in their raw format, as well as discover the representations needed for detection or classification in an automated fashion [205]. Even though, the modern term of deep learning can be applied in ML frameworks that are not necessarily neurally inspired [205], the most common use of the term refers to ANNs which have two or more hidden layers.
Deep learning approaches have given really promising results and have achieved human or even superhuman performance [206] in certain types of problems. There are many different architectures of deep neural networks. The most commonly used for supervised learning are feedforward NNs [193], convolutional NNs [207], RNNs [208], while autoencoders [209] and Restricted Boltzmann Machines [210] are used commonly in the unsupervised setting. There is also the combination of deep learning used in conjunction with RL leading to deep reinforcement learning [105]. In our search, the primary use of deep architectures in DR has also been for load and price forecasting taskslike in the case of single hidden layer ANNs. Additionally, deep architectures have been applied to predict the users' response behaviour [63], control residential appliances (considering DR events) [211], identify socio-demographic information about the consumers to help retailers provide more personalised services and make more reliable decisions on the targeting of DR [212], as well as for clustering customers based on the encoded load profile, by using deep autoencoders [213].
Similar to "shallow" ANNs, the prevalent topology of deep networks used for DR is the feedforward architecture [109,211,[214][215][216][217][218]. Other types of deep ANNs found in the literature are long short-term memory (LSTM) [63], convolutional neural network (CNN) [212], and a deep RNN [217]. LSTM is a type of RNN which can handle better long-term dependenciesin exchange for higher computational costand CNNs are well suited for processing data with a grid-like topology. Most of these models have been used for regression, with the exception of Ahmed et al. [211], Wang et al. [212], and have been trained with the Levenberg-Marquardt backpropagation algorithm [109,211,217]. To avoid overfitting the surveyed literature has used data augmentation [212], dropout [212,214], and training with momentum [215].
In comparison with traditional "shallow" techniques, deep learning has the ability to learn highly non-linear, complex relationships and correlations between the input and output data. For that reason, in the DR literature it is shown that deep learning methods usually outperform in prediction accuracy traditional techniques like SVR [63,214,217], shallow ANNs [63] and Random Forest [63]. However, this flexibility comes with a cost. Specifically, deep learning architectures require a large amount of data to outperform other approaches, are computationally expensive to train, and are not easily interpretable. Further, it is not fully understood why they work so well in certain types of problems [219], and it should be noted that arbitrarily increasing the depth of an ANN might not always yield the best results [220].

Multi-agent systems
Due to the distributed nature of the demand-side in power systems there is the need for approaches that can learn, plan and make decisions in an environment that involves multiple interacting intelligent agents. The tools to study these problems are provided by a sub-area of distributed AI called multi-agent systems (MAS). The subfields of MAS studied in this review are automated negotiations for the negotiation between the various participants in a scheme, cooperative/coalitional game theory for the study of coalitions among these participants, as well as mechanism design.

Coalitional game theory
Game Theory is a branch of economics that is largely involved with the domain of decision making by self-interest entities [221,222]. The main concept in Game Theory is the game which is a mathematical model that describes and captures the main features of the interaction between these self-interest entities [223]. One of the key objectives of game theory is to try to understand what constitutes as a rational outcome of a game, and numerous solution concepts have been developed to find a subset from the set of possible outcomes in a game (e.g. Nash Equilibrium).
Coalitional (or cooperative) game theory is one of the basic classes of game theory. In cooperative game theory, there is an abstraction from individual players strategies and instead focus on the coalitions players may form. There is the assumption that each coalition may attain some payoffs and then the goal is to try and predict which coalitions will form (and hence the payoffs the agents obtain). Cooperative game theory concentrates on division of the payoff, and not so much on what players do to achieve those payoffs [223].
In the DR context, cooperative game theory has been highly used; especially in the cases where there are binding agreements in place (i.e. incentive-based DR). The main applications of cooperative game theory in DR is the selection of the optimal set of electricity consumers to participate in DR schemes, and the allocation of the coalition's payoff among DR participants (known as solution concept). The solution concept corresponds to the way the total revenue is split among the DR flexibility participants, which depends on the criteria that the aggregator wants to meet. Some of the main solution concepts are the Shapley Value (fairness criterion), the Banzhaf Index (fairness criterion), the core (coalitional stability), Nucleolus (based on the notion of deficit), Kernel and Stable set. In DR, the most commonly used solution concept is the Shapley Value (SV), which defines a fair way of distributing the payoff of each participant after a DR event [224][225][226]. Indeed, the SV proposes a unique, fair and symmetrical distribution of the effort, reward or penalty between participants of a DR program, as it proposes a reward to each participant that is proportional to their contribution.
For example, in a coalition game, we can consider the total expected payoff for the set of participants to a DR event S⊆χ (with χ ¼ f1; 2; …Ng the set of all N loads associated with the considered aggregator) defined as a characteristic function v : S 2 2 N →R. This characteristic function can be determined by an aggregator. In the case of a load reduction DR event, vðSÞ ¼ c P  where χ =fkg is the set of all participants except participant k.
Although the concept of the Shapley value provides a fair and unique solution to coalitional games, its downside is its computational hardness and for that reason a number of papers have proposed approximations. Bakr and Cranefield [227] compare three methods for calculating the exact or approximate Shapley value, which include a linear-time approximation proposed in Fatima et al. [228], and a stratified random sampling proposed in Maleki et al. [229]. O'Brien et al. [226] propose a stratified sampling technique in conjunction with a RL heuristic to approximate the Shapley value.

Mechanism design
Mechanism design is a strategic variant of social choice theory. Under this theory agents are assumed to behave in a way that maximises their individual payoffs. In mechanism design, the goal is to design a game (e.g. DR pricing scheme, scheduling of appliances) in a way that the equilibrium of the game is guaranteed to have a specific set of properties, independent of the unknown individual preferences (e.g. unknown preferences of DR participants) [230]. As stated in the book of Shoham and Leyton-Brown [230] mechanism design can be thought as an exercise in "incentive engineering".
In the DR literature mechanism design has been widely applied, because it is of high importance to the success of DR schemes to guarantee certain properties. In DR, mechanism design is primarily utilised to design incentive-based mechanisms where consumers are incentivised to provide truthful bids. Several papers propose DR mechanisms that make sure that the consumers will maximize their utility function by reporting their preferences truthfully [115,[231][232][233]. Such mechanisms are called Incentive Compatible (IC) mechanisms. Hayakawa et al. [231] propose a scheduling and payment function based on future prices, and end-users' preferences to the different time periods within the day. Two dominant-strategy equilibrium, "penalty-bidding" mechanisms are proposed in Ma et al. [233], and Ma et al. [232] propose a mechanism that uses a "reward-bidding" approach rather than the approach of Ma et al. [233] to stimulate truthful behaviours. Meir et al. [234] use the Vickrey-Clarke-Groves (VCG) mechanism to design DR contracts, ensuring that the participating agents will reveal their true costs for participating in DR. Finally, Kota et al. [235] propose a cooperative mechanism that is efficient and incentive compatible, in the sense that participants do not gain by augmenting their baseline consumption to show an artificial demand reduction. In this mechanism, the aggregator selects a subset of agents, place bids for this subset in the electricity flexibility market and distributes the revenue among agents according to their consumption reduction's commitment, while penalising those who increased their consumption.

Automated negotiation
Broadly defined, negotiation is an allocation mechanism that can be used to allocate goods (e.g. in Bajari et al. [236]), resources (e.g. in Sun et al. [237]), or tasks (e.g. in Edalat et al. [238]), among a set of agents. Existing literature identifies two main classes of allocation mechanisms [239]: � Auctions are mechanisms where one side automates the process during which participants from the other side compete among them. In this case there is a fixed protocol as well as rules. The aim of auction theory is to create an optimal auction design so that certain desirable properties are guaranteed, using mechanism design principles discussed in Section 3.4.2. � Negotiations are a rich and not so well-defined group of processes used for allocating goods, resources, services or tasks, and they include an exchange of information comprised of offers, counteroffers and arguments with the purpose of reaching a consensus [240]. Automated negotiation approaches give the ability for more decentralised, flexible protocols and for customised and complex agreements. The agents can use incomplete information about their opponent (and their own) preferences and the primary focus is on the design of the agents' strategies, not on the allocation mechanism itself. In this section, the focus is on negotiation (bargaining) mechanisms, as mechanism design approaches have been discussed in Section 3.4.2.
There is a number of definitions of automated negotiation in the existing literature. In this work we use the broad definition of Lomuscio et al. [241]: "Negotiation is the procedure by which a set of agents communicate with one another to try to reach agreement on some matter of common interest." In more detail, in automated negotiation research the interest lies in the creation of software programs which will be able to negotiate on behalf of their users or owners [242]. These programs are called software agents, or more simply agents. In the most general way automated negotiation is mainly the design of high-level protocols for the interaction among agents and is one of the key research topics in multi-agent systems.
In automated negotiations related to energy DR, a buyer agent (consumer or aggregator) will negotiate with a seller agent (producer or retailer) on several issues, for all the periods of the day. Issues are the objects of the negotiation, e.g. the price, or the quantity of energy. For example, in the context of forward bilateral contracts, Lopes et al. [243] present a negotiation framework where buyers and sellers negotiate the amount of energy E ¼ fE 1 ; …E 6 g and the prices p ¼ fp 1 ; …; p 6 g for the 6 periods that constitute one day. Negotiations take place in several stages. First, a pre-negotiation phase, where the number and type of issues are defined by the market operator, and each agent determines its preferences for each issue. Each agent also sets its own (private) utility function. For the buyer, Lopes et al. [243] propose a cost function (c ¼ P 6 i¼1 p i ⋅E i ) to be minimised, with some constraints, i.e. the minimum energy quantity for each time period and for the whole day. For the seller, a benefit function is proposed (b ¼ the cost to produce one unit of energy). Finally, based on this utility function, each agent also defines the threshold utility value (highest acceptable cost for the buyer, lowest acceptable benefit for the seller), under/over which it will not agree to accept a deal. After this pre-negotiation phase comes the actual negotiation phase, where each agent applies its strategy to obtain the best deal. The negotiation consists in an iterative process, where for each iteration, an agent makes an offer (consisting of a specific value for each issue under negotiation). The other agent may accept the offer, send a counteroffer, or end the negotiation if the offer results in a value for its utility function under/above the threshold it has determined before. In the case of a counteroffer, the process is repeated until one of the agents accepts the other agent's offer or abandons the negotiation. As automated negotiations are performed by software agents, it is natural to use AI techniques to improve the negotiation strategy of the agents. In applications in the energy sector, Rodriguez-Fernandez et al. [244,245] propose a Q-learning approach (RL algorithm based on previous negotiations with all the other agents) to predict the expected prices for all possible scenarios, and then choose the best negotiation counteroffers and reach the deal with the highest/lowest utility. Moreover, Golpayegani et al. [124] utilise an argumentation-based negotiation, where the proposing agent justifies its proposal and the negotiating software agents can exchange arguments (encoded in formal logic) when they do not accept the opponent's proposal. The core idea is that these arguments will help agents to search for and propose offers that are more likely to be accepted by their opponent.

Application areas of AI in demand response
For the effective implementation of DR programmes, there are numerous issues that need to be considered; from load and electricity price forecasting to identifying the right consumers to participate in DR schemes and creating automated systems that manage demand-side resources. AI methods have been applied across the spectrum of DR by providing the tools for prediction, real-time efficient control of distributed systems, decision-making, while adapting to an ever-changing environment and learning from human behaviour [246]. In this section, we identify the areas of DR where AI has been employed in the literature and classify them accordingly. The proportion of the reviewed literature where AI has been used for each particular DR application area is shown in Fig. 7.   Fig. 6. Proportion of the reviewed literature using specific group of AI techniques for DR purposes.

Forecasting in DR
One of the major purposes, for which AI techniques have been employed, is forecasting. It has been identified, that in the DR context AI methods have been used for the prediction of electricity prices and various load types. Forecasting can inform real-time electricity scheduling, as well as longer-term system and service providers' planning [247]. Short-term forecasts can improve electricity scheduling, enabling aggregators to provide better services, and consumers to respond closer to optimal in DR signals. Better long-term forecasts can enhance the planning process, helping service providers and operators to have a better understanding of the available flexibility, which consumers to target for DR, and setting DR signals (compensation/prices).

Load forecasting
Prediction and estimation of loads is an integral part of a reliable and efficient power system operation. Effective demand forecasting is an important tool for tackling various issues in DR, including properly planning, rewarding DR participants, and estimating capacity potential of DR resources [18]. A widely used distinction of demand forecastingbased on prediction horizonis long-term load forecasting (> 24h), and STLF (< 24h). In this review, the papers included are those which explicitly look the load forecasting problem in the DR domain. The reader interested in the wider spectrum of load forecasting in the smart grid context can refer to the review of Raza and Khosravi [18].
Another basic distinction is whether load is predicted while taking into account, or not, the DR factor. In the literature there is a variety of papers which estimate demand including reduction or shifting due to DR. The bulk of the papers predict demand for the short-term or dayahead [57,58,78,93,176,196,200,203,215,217], but there are also papers forecasting the week ahead load [177]. Moreover, load forecasting has been performed in various aggregation levels; residential [58,173,174,199], large buildings [93, 187.196], and appliance-level (e.g. chiller, ice bank, lighting) [56,217].
For aggregated residential loads forecasting, Zhou et al. [58] and Zhou et al. [57] compare different forecasting techniques, including least squares, lasso and ridge regressions, kNN regression, SVR and decision tree regression. Similarly, Cheung et al. [78] provide a 1-h ahead aggregated load forecasting using SVR and ANN to a dataset already partitioned based on temporal features. Aggregated loads forecasting can also focus on determining day-ahead peak demand, either at a building level [93], or at a feeder or community level [203,215].
For domestic load forecasting, Pereira et al. [93] use an ANN based algorithm for single prosumer consumption and production forecast for day ahead, based on historic demand and temperature. Load forecasting at the appliance level can also be done with ANNs. Schachter and Mancarella [200] utilise ANNs for HVAC systems load forecasting, and Mohi Ud Din et al. [217] focus on the prediction of loads of domestic appliances by using deep neural networks with a PCA-based feature selection scheme.
The case of load forecasting without factoring for DR is referred as baseline load estimation. The baseline load is the counterfactual power consumption in the absence of a DR scheme and is important in the context of DR [184]. The baseline consumption estimation of consumers plays a key role in the implementation of the various DR programs, and it is utilised to reliably estimate the consumers' normal power consumption, which is subsequently used to reward the DR participants [184]. In the reviewed literature, there is work regarding baseline load estimation for a residential environment [89,184], industrial factories [64], and office buildings [76,194,248]. There are also instances of forecasting baseline using aggregated loads over time [60,61], over consumers [60], and over independent energy processes [194]. Aggregating loads can lead to smaller prediction variance, and allocation of DR rewards with much higher confidence.
Complementary to the above, flexibility forecasting is frequently studied in the research literature. Flexibility is defined as the effect of the smart-grid control signal on the loadwhich is considered as a function of time, weather circumstances and the control signal - [67]. Knowledge of the available capacity for DR is considered crucial and is beneficial for developing and optimising DR-strategies, as well as for assessing their economic value [67]. In the context of VPPs, there is work that estimates the flexibility of a cluster of heating devices for DR, which are assumed either homogeneous [65,181,182], or heterogeneous [183]. The estimated flexibility can be traded either in a single market (DA) [65,183], or in multiple energy markets (intraday, DA market, imbalance) [181,182]. Further studies include flexibility forecasting of residential heating systems [67], aggregated flexibility prediction [191], and estimation of the potential capacity for peak time DR.
Other literature related to load forecasting is the work of Liu et al. [63] where they predict the consumption's reduction of users under different incentives in DR, and the paper of Akhavan-Rezai et al. [190] where a prediction model of the future car arrivals is employed as part of a wider EMS for incorporating aggregated plug-in EVs in future smart parking lots. Future car arrivals translate to potential load which is going to be available for providing real-time pricing DR services.

Price forecasting
Prediction of the electricity prices has been done both at the aggregator and the consumer level. Li et al. [185] present a multi-aggregator setting, where only one aggregator is implementing a DR scheme, and they predict the regional, wholesale electricity price, based on the demand bids of the various aggregators to the SO. Additionally, Lu and Hong [109] apply a model to predict the wholesale electricity market price, and that forecast (among others) is used to obtain the optimal incentive rates for different consumers. At the consumer level the majority of the papers are concerned with forecasting the day-ahead prices of residential, dynamic pricing schemes [54,55,172,186,192]. A different example is the work of Huang et al. [216], where the dynamic electricity price of the next hour, only for industrial facilities, is forecasted.

Scheduling and control of loads for DR
The large number and range of devices which can be used for DR pose an important challenge, both for the companies offering services and the end-use consumers. In the case of service providers, it is technically infeasible to manage their portfolio of DR units without automating the process of units' scheduling and control. Additionally, for widening participation of consumers in DR schemes, it is imperative to schedule and control the multitude of demand-side appliances in an automated fashion; otherwise consumers will suffer from the phenomenon known as response fatigue [249], and drop out of the DR programme eventually. The scheduling and control of the various units for DR can be done either in the service provider (aggregator) level, or the consumer level. The main difference between the two levels is the scale and scope of units. Algorithms used to schedule and control devices at the aggregator level need to be more scalable and able to work in a more diverse environment, than in the consumer level.

Load scheduling and control at the aggregator level
While control of units for DR is self-explanatory, in a scheduling problem the time schedule of a sequence of events needs to be planned to improve the time efficiency of the solution. The scheduling can actually be considered as a constrained multi-objective optimisation problem.
Regarding load scheduling, Pedrasa et al. [162] schedule the loads of DR participants for the day ahead (DA), and there is research on scheduling the DR resources in a VPP, assuming no constraints [160], as well as network constraints [166] and the system balance [163]. Furthermore, Medved et al. [121] propose a scheduling of the DR units in the portfolio of the aggregator for DA with the objective to maximize the aggregator's profit, and the aim to minimise the impact of variable resources on the grid. It is noted that in this case there is no initial knowledge of the network's constraints, but the constraints are learned through the interaction with the DSO. Herath and Venayagamoorthy [158], Zhu et al. [250] employ a multi-objective and cooperative model, respectively for scheduling appliances in a smart neighbourhood; the inconvenience is factored into the model as the delay of each appliance, and the deviation from an acceptable temperature range. A multi-objective decision-making framework is also proposed by Fotouhi Ghazvini et al. [147] to assist retailers, with a small number of assets, by scheduling their resources for DR, while trying to minimise the retailer's short-term financial losses and avoid future capacity charges.
Furthermore, Hurtado et al. [114] developed a cooperative and decentralised agent-based platform to exploit and manage the demand flexibility potential of non-residential buildings (part of an aggregator's portfolio), while taking into account the individual building dynamics. Moreover, there is also research on the setting of an aggregator controlling directly a cluster of homogeneous [117], and heterogeneous [119] residential thermostatically controlled loads (TCLs); the control of TCLs is for providing DR services. There is also research focused on scheduling the charging of EVs' fleets, for providing DR services [161,231].

Load scheduling and control at the consumer level
The automated scheduling and control of the various units at the level of power consumers, is provided by individual systems which are called energy management systems [81,112,136]. EMS act as an agent for energy users, by making automated decisions in response to DR signals while taking into account electricity expenses, the customers' comfort preferences and lifestyles trade-offs, as well as optimal utilisation of appliances/equipment. Automated EMSs are the key for a higher adoption of DR schemes by residential, and small commercial/industrial entities. Scheduling of loads for DR under an EMS have also been considered by Lin and Tsai [251] and Veras et al. [252] where they propose an in-home power scheduler for domestic appliances without user intervention, while taking into consideration constraints for the various household appliance groups.
Such trends of understanding consumer behaviour and appliance usage, coupled with non-intrusive load monitoring (i.e. monitoring which is 'invisible" to the individual energy user [100,253,254]), are key for assuring user-friendly friendly demand-side response, especially in residential settings -potentially enabling faster consumer adoption of demand response programmes.
The objectives in the scheduling and control problem of consumers' appliances, usually are the minimisation of electricity cost [112,120,123,134,136,137,157,167], energy consumption [136,211], Peak to average ratio (PAR) [136,137], as well as the maximisation of social welfare [255], and environmental pollution [144]. These objectives need to be met, while considering the users' preferences at the same time. There are two basic approaches to formulating users' preferences. One way is to represent the user preferences over home appliance use with a utility function, which can be pre-specified [81,112] or learned [79,111]. The second approach is by imposing constraints on feasible schedules [123,134,137,157]. Typical appliances used for control in the DR context are TCLs, such as heat pump [81,118,120] and water heater [81,118,211], air conditioners (ACs) [66,81,175,211], battery storage systems (BESS) [81,256], and EVs [81,127,189,256]. There has also been work where, along with the household appliances, they schedule the self-consumption of PV generation in order to minimise the produced PV power fed back in the grid under a dynamic electricity pricing scheme [257], and work where they schedule the battery assets for DR in a datacenter while trying to minimise the batteries' degradation [149].
Regarding the type of consumers, even though the majority of the papers are focused on residential buildings [79,120,137,157], there is also research for small commercial buildings [111], and smart EV charging stations which can receive peak demand signals and accordingly adjust their charging schedules to provide a DR service [189], and in the industrial setting where a multi-objective optimisation model has been employed to coordinate the load interruption strategies of complex industrial processes [146].

Design of pricing/incentive schemes (compensation mechanisms)
The way a pricing or incentive mechanism is designed affects not only the profitability of the aggregator, or the retailer company, but also the success of the DR scheme. How successful a DR programme is in appealing to new participants, and ensuring that consumers remain enrolled in it, relies in part on a fair and attractive compensation mechanism.
Regarding pricing mechanisms, the majority of the papers use AI techniques to find the optimal dynamic scheme for day-ahead in a hierarchical electricity market [66,80,107,108,138,139], while maximising the service provider's profit subject to realistic market constraints and consumers' discomfort for load reduction/shifting. Moreover, Babar et al. [110], Herath et al. [165] have built a model based on price elasticity matrix which is proposed for dynamic pricing, and Gamage and Gelazanskas [198] are using the modelled relationship between real-time price and electricity consumption in a DR scenario for real-time pricing. Robu et al. [225] propose a new tariff structure called the prediction-of-use (POU) that calculates the tariff based on the difference between the predicted and the realised power consumption of the end-use consumers. Carrasqueira et al. [134] are simultaneously exploring different electricity priceswith upper and lower bounds for the pricesto charge the consumers in a bi-level model.
As far as the incentive mechanisms are concerned, there are quite a few papers involved with fairly compensating a coalition of consumers, who are collectively reducing or shifting their consumption during a load curtailment event [224,226,227]. Additionally, Lu and Hong [109] focus on learning the optimal incentive rates for different electricity consumers considering the profitability of both consumers and the service providers (aggregators) in a hierarchical electricity market. Kota et al. [235] develop an incentive-based DR mechanism, where DR participants are rewarded based on their contribution towards a reduction target, and the reward function has two components. A positive component which is its payment for participation in reduction, and a negative component denoting any penalties imposed on the agent. Jain et al. [115] develop a model where monetary rewards (offers) are made to the consumers in exchange for reducing the consumption, and at the same time it is learning the probabilities of consumers accepting the offer. Xie et al. [141] learn the interruption load compensation price, which in turn is used as an initial assumption for a multi-round bidding model.
Furthermore, Meir et al. [234] propose a new DR mechanism that offers a flexible set of contracts for DR using Vickrey-Clarke-Groves pricing. In this new mechanism, a subset of consumers is selected in order to reduce consumption, while taking into account the probability that the reduction target is met (reliability). Ma et al. [232] generalise previous work [233] by incorporating uncertain costs for preparing, multiple levels of effort, and multi-unit consumption reduction. To achieve efficient incentives, this work proposes a reward-bidding approach instead of a penalty-bidding mechanism. There is also work concerned with the design of contracts in incentive-based DR. Lopes et al. [243] study bilateral contracts (involving a retailer agent and a commercial customer) in a multi-issue negotiation setting. Similarly, Haring et al. [258] design reward contracts for ancillary services, where service providers take part in the wholesale ancillary service market and coordinate consumer interaction at the retail level. Their work also takes into consideration the interaction among consumers, except from the communication between the service provider and the consumers.

Load/customer segmentation
Categorising electricity consumers in groups is an important application area for DR. It can support service providers in designing DR programmes, aggregating resources, evaluate the load potentiality of participation in different DR program, etc. [97].
In the researched literature, the generated groups of consumers are created to accomplish various tasks in the DR setting. A large part of the reviewed work, classify consumers to discover potential consumers for DR programmes [57,84,87,91,95], and identify the optimal set of consumersparticipating in DR schemesto be called for demand curtailment during DR events [85,116]. Wang et al. [212] identify socio-demographic information from load profiles, and these consumers' characteristics can be used to select potential DR participants, and Zeifman [259] classify households based on their probability of enrolling in DR schemes. An additional part of the literature, groups electricity consumers to support the DR compensation mechanism. Chen et al. [90] use the resulting typical daily load profiles in every group to design individualised electricity price schemes for price-based DR programmes, whereas Panapakidis et al. [92] utilise these typical load profiles to create the dynamic price elasticities curves. Spinola et al. [88] cluster DR resources to obtain compensation prices. This way the most efficient resources are well compensated, and that gives them the incentive to participate in the aggregator's scheduling. Other uses of the classifying customers are the design of DR programmes and load control schemes [97], aggregation of DR resources [86,260], the analysis of a DR project's potential benefit [68], and the identification of hourly loads for implementing DR programmes [178].
The most widespread categorisation of consumers for DR purposes is based on their load profiles [68,82,84,85,90,91,95,97]. The load features used for clustering could be peak load [84], average load of 5 consecutive weekdays [57], and various chosen attributes (e.g. mean relative standard deviation, seasonal score) [85,95]. On the other hand, there are methods for allocating consumers to groups without the use of load data. A number of works categorise consumers considering their bid-offer datain an incentive-based DR scheme - [94], their behaviour (for EVs participating in DR) [261], their expected effect of the DR program [96], number of household occupants, building size, building type and terrain type [87].
Moreover, additional work utilises clustering techniques to define flexibility envelopes for DR applications. In this vein, Spinola et al. [262], Spínola et al. [263] have grouped the flexibility of DR resources, in support of an aggregator, while Kouzelis et al. [264] have grouped flexible loads for DR services. Trovato et al. [265] have created flexibility envelopes of TCLs for DR, whereas Develder et al. [266] have partitioned the flexibility of EVs for DR services by clustering EV charging sessions. Alizadeh et al. [267] have utilised a custom clustering algorithm to aggregate the flexibility of batteries and small deferrable loads for DR. In the more general case of electricity markets, clustering methods have been applied to compute the aggregated flexibility of an aggregator's portfolio of assets, such as the work of Iria and Soares [268].

AI industrial/commercial initiatives in demand response
In addition to being a highly active area of research, the energy industry, including stakeholders, DR companies, policy makers and utility companies have shown a growing interest in AI-based technologies and especially their use for tackling the complex challenge of balancing the power system. This section of the review discusses the potential of AI in DR services and presents a general overview of its current application in the industry. In addition, it reviews the changes and developments in the business models due to the use of AI approaches. A catalogue of the companies that use AI technologies to provide demand-side ancillary services can be found in Table 2 of the Appendix.
The value in DR is constantly shifting as the needs of the power grid changes. Originally, the simplest example of DR assets that were targeted by DR companies were the back-up generators and cold/heat storages of large businesses. This was because conventionally larger commercial assets were preferred as they would be easier to schedule, control, commission, etc. However, the emergence of new technical solutions (e.g. IoT, big data solutions), and the diminished revenue in simple DR marketsdue to the increased number of offers -, has shifted the need for DR to real-time and fast response services. Consequently, certain regulatory changes are driving slower assets out of the market [269].
In response to these changes, the Department for Business, Energy and Industrial Strategy (BEIS) of the UK Government launched two different schemes to boost the use of innovative technologies in DR which are namely, Innovative Domestic [270], and Non-Domestic Demand-Side Response Competition [271]. The attractive funding provided by these schemes created an environment for start-ups, spin-offs and other new companies to emerge. Hence, the new trend in industry is scalable DR and auto-DR. Through the use of AI and ML, companies can now integrate domestic and smaller assets into their portfolios. A good example is the Ubiquitous Storage Empowering Response (USER) project (Levelise) which aims to increase the number of prosumers in the domestic sector by using AI-led hot water tanks. This project claims that if 9 million tanks were managed using AI, they could have an aggregated capacity of 27 GW available for DR services [270]. Meanwhile in the small and non-domestic DR competition, Flexitricity received funding for the aggregation of smaller HVAC and cold storage loads, and gridIMP for delivering a fully automated, self-learning DR electricity control system. The control system would be designed to learn the specific behaviours of consumers in order to adjust DR participation [271]. Thus, the available funding and industrial setting creates a favourable environment for the emergence of numerous projects and initiatives in DR.
Furthermore, the industry survey realised in this study has shown that a large proportion of new AI companies involved in DR, have either been start-ups (more than 40% of the total number), or recently bought by, or having teamed up with larger companies (i.e. global consultancies and big corporations). To give just one example, Vattenfall, a Swedish power company, acquired all shares of the Dutch start-up company Senfal. Therefore, it combined its diverse portfolio of clients with the innovative and flexible technology of Senfal that utilises optimisation, AI and ML [272]. Orsted and Open Energi are another example of a similar group. Open Energi's AI technology product is presented by its electricity supplier partner, Orsted. It aims to make DR decisions in a smarter way as it is based on more granular asset and market data.
Regarding the application domain of AI for the reviewed DR companies, the survey has shown that the two most popular uses are forecasting and automation. These data-driven AI approaches have mainly used historical frequency data, along with pricing and weather data [273]. Moreover, the most widely adopted architecture is based on a cloud platform that collates data from numerous sources and uses machine elarning to automate service participation, like in the case of Upside Energy [274]. Thus, efficiency of DR solutions rests on various technologies like big data management (primarily cloud based), ML techniques to interpret these data, optimisation algorithms, and IoT devices to allow a bidirectional communication between the aggregator and the controllable end-users appliances.
This growing interest of the industry in DR solutions is also well illustrated by the funded projects related to this topic. Table 3 presents several current projects which are funded by the European Union, through programmes like Horizon 2020. In each of these projects, DR is a solution proposed to the consumers for providing flexibility to the grid, while maintaining comfort or economic welfare to the end-users. AI tools are mostly used for forecasting (load, production-weather, price) tasks. These forecasts are in turn used by the service provider companies level to provide an optimal scheduling of the flexibility. The current trend in the industry is to take advantage of the new technologies (e.g. IoT, big data, AI) and automate DR, while providing interoperability across all platforms and devices [275].

Discussion
In the previous sections, we have performed detailed reviews of the fundamental AI techniques used in energy demand response, the key application areas of interest in this domain, as well as of the areas attracting ongoing industrial interest and investment. Against this background, in this section we present and discuss some summary statistics covering all the works reviewed, as well as a discussion of the key challenges and opportunities of the various techniques identified by our study.

Challenges and opportunities of using AI in DR
The research literature reviewed in this work show that various groups of AI techniques have been used for numerous DR applications. Fig. 8 is a heatmap chart displaying the number of reviewed papers that have utilised a specific category of AI methods for a particular DR application area.
Forecasting Looking at Figs. 6 and 8, it is apparent that one of the most heavily utilised family of methods is artificial neural networks, which have been mainly employed for forecasting applications. ANNs have been used both for load and price prediction, and the researchers have applied them using a single hidden layer, as well as "deeper", multi-layer architectures. The capability of ANNs to learn arbitrary, non-linear, complex functions has made them attractive for forecasting tasks in DR [276], where the predictions can potentially relate to numerous inputs in a highly non-linear fashion. On the other hand, their performance can vary greatly depending on the set of selected variables which will be used as inputs, the training algorithm, and the tuning of their hyperparameters; where there is no single method that guarantees the optimal selection of these. Moreover, it needs to be taken into account that ANNs can be computationally expensive and usually require a large amount of data in order to outperform other less flexible methods. This can pose a problem for DR applications, especially due to the current limited adoption of DR programmes.
Another set of methods which have been primarily used for forecasting purposes are supervised machine learning techniques. These methods in general are less flexible, higher bias techniques than ANNs, and rely heavily on feature selection and feature engineering 9 to produce good resultscompared to ANNs. On the other hand, supervised methods such as regression trees [75,76] and gradient boosting [54] can handle missing data better than ANNs, and require fewer examples to train, which has merits in the DR setting. Another important aspect of some supervised learning methods used in the DR literature, is the use of probabilistic models for load forecasting, i.e. GPs [60][61][62]. Using a prediction model which does not output only a point estimate, but a distribution, can lead to more informed decision making in DR, as well as better rewarding of the participants through a more accurate baseline estimation.
The future of demand response in smart grids is steering towards a highly granular control of the end-user loads. This calls for a more highly accurate load and price forecasting. Traditional approaches to load and price forecasting in DR include time-series models such as autoregressive (AR), auto-regressive integrated moving average (ARIMA), and exponential smoothing [277]. This type of models is generally linear in nature and have been shown to provide less accurate results in load forecasting [278]. The lower prediction performance of classical methods can be attributed to their linearity assumptions, and that is the reason why ANNs, with their ability to approximate highly non-linear relationships, have been primarily employed for load and price forecasting in DR. Additionally, due to the fact that demand is increasingly becoming more non-linear and variable, AI methods are bound to show even more promising results in load and price forecasting. Also, another advantage of AI forecasting techniques is the ability to output forecasts that span multiple horizons in time and space, and the ability to incorporate uncertainty in the forecasts, leading to more informative predictions. On the other hand, AI approaches for forecasting are more computationally intensive and their performance can vary depending on their hyper-parameter tuning and feature engineering.
Consumer/load clustering In the current DR setting, there are limited labelled data on which to classify customers [279]. As a result, using clustering (unsupervised) models is the only viable approach to address the task of segmenting electricity customers. This is also supported by the research, as the vast majority of the papers reviewed use clustering techniques for creating customers groups. While clustering techniques are beneficial in this application, they present a number of challenges. Among others, these techniques require data pre-processing (i.e. normalisation) to work, suffer from the "curse of dimensionality", and is really challenging to evaluate their results [52] due to the lack of labelled data.

Dynamic control
Continuing with ML approaches, reinforcement learning methods have been mainly employed for control tasks. At the consumer level scheduling and control of the various DR units needs to be automated (especially in the residential sector) -that is why home EMS are needed. Additionally, at the service provider level, especially in direct load control DR programmes where the multitude and variety of devices and appliances across the aggregator's portfolio, the process of control 9 The process of creating features using domain knowledge. and scheduling is rendered infeasible without automating a big part, or the whole process. Learning from interaction and acting accordingly to the consumers preferences is important for DR control systems. As already stated in Section 3.1.3, the most widely used RL algorithm in DR is Q-learning. While it is an online method and offer convergence guarantees, using tabular methods such as Q-learning can be challenging when the space of actions and environment states becomes large [101]. This can be a problem especially in the service provider level; where the quantity and variety of DR units and different environments, is levels higher compared to a household or an office building. There is work where researchers try to alleviate this issue by approximating the action-value function using an ANN [121] or using FQI [117][118][119][120]. The literature has also employed multi-agent RL methods to tackle the problem of the large state space [114,126,127].
Compared to traditional control mechanisms for DR, such as Model predictive control (MPC), RL approaches do not generally require a model of the environment to be applied (although there are model-based RL algorithms) [101]. This provides an advantage in designing DR control systems that take into account consumers' preferences. Moreover, deep RL has been shown to work better in high-dimensional tasks [101]. In contrast, model-based control needs a model of the consumer and the participating agents. That problem in general is intractable and there is no feasible way to model all the involved agents beforehand, whereas with methods like reinforcement learning the preferences of the DR agents can be learned through interaction. Furthermore, RL's adaptive online nature makes them more suitable for applications in dynamic environments, like the control of appliances and equipment for DR, whereas MPC methods successful application depends heavily on the quality of the prior knowledge regarding the system dynamics [280]. In an era, where DR-related data becomes more abundant AI approaches for control are able to provide more personalised DR services. On the other hand, MPC methods are a more mature technology with inherent constraint handling, and a mature feasibility and robustness theory [281]. Another big issue of RL in general, with implications to its correct application in DR, is the design of reward signals [101]. There have been quite a few cases where RL agents have found unexpected ways to make their environments deliver reward, but with undesirable policies [101]. In the energy DR literature, to the best of our knowledge, this is a heavily under-researched topic.

Scheduling
In the majority of the cases, nature-inspired algorithms are the most frequently utilised for scheduling tasks. In general, the scheduling problem can be highly complex, non-linear, and non-convex. This group of algorithms is able to find promising solutions in a reasonable time due to their exploration and exploitation ability [128]. Other key advantages include their robustness and adaptability with changing conditions and environment, are parallel algorithms, and can incorporate mechanisms to avoid getting trapped in local optima [128]. Moreover, this group of algorithms often have good "anytime" properties, in the sense they return promising solutions even if the computation is stopped earlier. This is an important property in real applications, where there are often physical limitations in the hardware and processing time available. On the other hand, nature-inspired methods do not offer the guarantee of finding an optimal solution, and specific algorithms have their own drawbacks. For example, GAs, if not properly tuned, can suffer from premature convergence and unpredictable results, and sometimes use complex, not always intuitive functions in selection and crossover operators, while PSO suffers from getting stuck into local optima and slow convergence speed [128]. Nature-inspired AI has also been employed for the design of pricing schemes, where the service provider tries to find the prices for DR, which will optimise their profit while taking into account consumers' preferences and network constraints. The NSGA algorithm, and its variations, have been applied in the multi-objective, Pareto efficient scheduling of loads for DR [144][145][146][147][148].
Classical algorithms used for solving DR scheduling problems are linear programming (LP), nonlinear programming (NLP), mixed-integer linear programming (MILP), and mixed-integer nonlinear programming (MINLP), depending on the formulation of the scheduling problem [282]. The primary advantageous properties of the population-based, stochastic, nature-inspired AI methods are that they can handle tasks with a large number of decision variables and also adapt to changes in scheduling for DR [128], compared to the deterministic classical DR scheduling methods. These abilities are important because they can result in adaptive DR systems that are able to alter efficiently changes and interruption in the scheduling of appliances and relevant equipment. Mathematical optimisation/scheduling methods usually rely on some implicit assumption, such as the system being linear or the search space being convex. However, real-life DR systems are increasingly composed of many heterogeneous devices of different types (e.g. batteries, HVAC units, industrial devices, EVs etc), which means the control problem is often non-linear in nature. For such non-linear optimisation problems, AI methods (such as GAs, NSGA or PSO) often perform better than traditional approaches [282].

Multi-agent systems and incentive design
While traditional DR approaches assume there is direct control of the devices being managed, real-life DR systems are increasingly an aggregation of a large number of devices (building HVAC, EVs, water tanks etc) that are under the control of different entities/parties who may have their own interests and objectives, not always aligned with those of the DR system operator. For such systems, multi-agent methods or those from game-theoretic mechanism design are increasingly important. In the reviewed literature, researchers have primarily applied multi-agent systems for the design of pricing/incentive mechanisms. Mechanism design has been used to design DR schemes which will have certain advantageous properties and satisfy specific conditions. While these methods provide significant insights into the behaviour of distributed DR systems, composed of self-interested parties, they are often dependent on the modelling assumption made. Where these assumptions do not hold in real life, the resulting schemes will not necessarily have the expected properties. Coalitional game theory has been applied in the design of incentive-based DR schemes and the distribution of the expected payoff to the participants. It is heavily used in incentive-based DR due to the contractual agreements between the service provider and the participants in a DR programme. On the other hand, computational complexity and intractability are issues that need to be addressed for these methods to be more widely applicable. Hybrid methods which could include function approximation (such as the work of O'Brien et al. [226], Bakr and Cranefield [227]), and efficient search could be a potential path for addressing these challenges.

Discussion of AI methods in DR schemes and consumer types
As it is displayed in Fig. 9 the primary focus of the surveyed literature has been on price-based programmes, with price-based related papers constituting half of the reviewed literature. The most common type of programmes in the surveyed papers are RTP [144,145,172,198,251,283], dynamic pricing [80,107,108,202,284,285], ToU programmes [64,71,148,165,189], and inclining block rate programmes [69,71], among others. In terms of AI methods for price-based DR schemes, machine learning, ANNs, and nature-inspired AI techniques are the most frequent approaches, in (almost) equal proportion. ANNs with 1 hidden layer are the most commonly applied AI subgroup for price-based programmes, and have been mainly utilised in load and price forecasting. Besides ANNs, supervised learning approaches have also been applied for forecasting applications in price-based DR. Nature-inspired algorithms have also been heavily studied for price-based DR. The majority of the nature-inspired AI methods have been applied for the scheduling of DR resources under varying pricing tariffs. This could be attributed to their relatively low computational complexity and ability to find solutions in a reasonable amount of time, which is especially advantageous under real-time pricing schemes and with sudden changes in schedules.
On the other hand, incentive-based (or contract-based) DR, appears so far, to have attracted comparably less interest than price-based DR from the AI research community. The main type of incentive-based programmes found in the literature has been direct load control [70,118,120,218,267], mainly of thermostatically controlled loads and EVs. The most widely used AI approach is reinforcement learning, which is used to control and schedule devices, while exploring the environment and learning through interaction with the user. Moreover, cooperative game theory and mechanism design methods have been mainly studied for incentive-based programmes due to the contractual nature of these schemes and the need for designing fair and incentive aligned DR programmes, as well as for rewarding DR participants in a fair and stable manner [234].
Finally, a high percentage of the unsupervised algorithms is agnostic to the DR scheme type. These unsupervised algorithms have been primarily employed for load clustering in the reviewed literature. Moreover, it is also worth noting that ANNs and supervised learning techniques have been applied irrespective of the DR scheme type, they can be used to obtain forecasts to both price and incentive-based DR schemes.
Regarding research related to the consumer type, as displayed in Fig. 10, the surveyed work has primarily applied AI approaches for residential applications or was agnostic to the type of end-users. This could be attributed to the current trend of including residential endusers in the flexibility offers to balance a supplier's portfolio, or to maintain the system's frequency and/or voltage. AI methods provide a great tool to help with addressing the challenges inherent to the provision of DR services while using a large number of different end-users. Moreover, for almost every category of AI approaches, besides cooperative game theory and mechanism design techniques, the largest part of the reviewed literature is focused on the residential setting. Next, there is a relatively high proportion of the total surveyed papers where the proposed frameworks have been agnostic to the type of end-users. The   Fig. 9. Application of various AI research method groups in DR scheme categories. majority of the surveyed papers using cooperative game theory and mechanism design belong to this category. This could be because they use abstractions which can handle all types of agents (residential, commercial, and residential end-users). On the other hand, a relatively small part of the explored literature has applied AI methods considering only industrial [54,64,76,148,149,248] or commercial [114,179,188,194] consumers, as these are already well-known application that require less coordination and data management. To conclude, Fig. 10 could indicate that there is a trend towards researching AI solutions to effectively utilise all types of loads in a flexibility portfolio.

Research evolution and recommendations for the future
As shown in Fig. 1 it can be observed that there is a sudden increase of research papers, which are using AI approaches (especially ML and ANNs) for DR applications, from 2013 onwards. This growing trend can be attributed both to the rise in popularity of AI approaches and DR. In Fig. 11 we clearly see that the usage of AI approaches has increased across all DR application areas; with the majority of the examined papers using AI techniques for forecasting and scheduling and control tasks. Additionally, we found that a big part of the literature, from 2013 onwards, has applied AI methods for residential DR and small scale industrial/commercial. This coincides with the need to increase the share of small-scale participants in DR schemes. Although at first, participants of DR programmes were large industrial entities [249], which consumed considerable amounts of electricity, going forward residential and small industrial/commercial entities will increasingly need to be brought into DR programmes, to achieve higher adoption of DR. AI techniques have been used to address the complexities of residential entities by automating the decision making process and control of DR appliances, based on consumers' preferences and behaviour. AI approaches have also been used to create better forecasts for demand and electricity prices, developing more accurate control and scheduling frameworks, and better tools for decision making, compared to the traditional modelling approaches.
Despite the great progress achieved by using AI approaches for DR, we identify a number of challenges which should potentially be addressed by future research. First, DR agents need to function in a partially observable environment i.e. agents cannot have perfect knowledge of the units used for DR, the other agents, and the environment [10]. In the reviewed literature, there is some work which addresses the problem of partial observabilityeither directly with the use of partially observable MDPs [114,283] or indirectly via function approximation [119,121]. So far, to our knowledge, the largest share of the existing research assumes fully observable tasks (e.g. formulation of the DR problem as fully observable MDP). The incorporation of partial observability and incomplete knowledge of the DR environment in AI models can pave the way for agents which operate under diverse environments and various constraints, both in the consumer and the service provider level. Furthermore, we recommend that future DR models need to be multi-agent and consider the objectives and actions of the various parties participating in DR programmes. While there is work assuming multi-agent environments (e.g. cooperative game theory, multi-agent RL), a big portion of the examined research models DR as a centralised, single-agent task, where other entities are not considered as agents with their own objectives, but as part of the environment.
Furthermore, forecasting techniques can help with addressing the stochasticity in DR models, forecasts need to become increasingly accurate, span multiple horizons in time and space, and better quantify the inherent uncertainty [247]. Additional considerations include the scalability of the proposed AI methods, especially when non-parametric methods are employed (e.g. following work [68,107,108]) and the heterogeneity of consumers and appliances when modelling DR. There is also a need to develop models and results that are generalisable to wider settings, and a need to assure the reproducibility of results (lack of modelling details is a key problem across a wide part of the reviewed literature). Moreover, there is increasing interest in using AI and ML for integrated demand response from other energy vectors (gas, heating networks etc.) [22], interacting with the power system, and this is expected to play an increasing role in the future.
Summing up, we believe a potential way forward for the research could be to adopt more multi-agent frameworks, where agents are able to function under a partial observable, stochastic environment, while at the same time relaxing the assumptions about the preferences and behaviour of participating entities (e.g. price elasticity of electricity consumers, economic rationality, discomfort functions, etc.).

Conclusions
Electrical grids are facing new challenges, such as the increasing share of DER and the growing adoption of new loads like EVs and heat pumps. To address these challenges, there has been a growing interest for DR solutions as it allows grid operators to maintain the electrical grid's balance at a low cost, while avoiding or delaying the need for costly reinforcements of the power networks, or investing in a lot of costly back-up generation. Although, DR programmes were originally targeting a small number of large industrial and tertiary consumers, currently there is a strong drive to include residential and small tertiary loads into the DR portfolio. This shift requires to correctly select the endusers contributing to a specific consumption shift, but also to schedule their consumption, control units for DR, and determine the reward/ penalty schemes. To achieve these objectives, AI solutions have been extensively used by researchers in order to find solutions where traditional approaches could not provide results that are sufficiently efficient or reliable.
In this work, the authors have reviewed over 160 papers, published between 2009 and 2019, as well as 40 companies and commercial initiatives, and 21 large projects to identify and discuss the trends for AI approaches in the energy DR sector. The literature reviewed in this work display that AI approaches are a promising technology for DR applications. Going forward, adoption of AI is paramount for the wide success of DR schemes. Even though AI approaches offer tools to tackle many challenges of the DR schemes, they also pose a series of considerations and limitations. Better understanding of the methods and their limitations is vital for the proper application in the DR setting.
Our review highlighted that a large number of different AI techniques are being used, but it appears clearly that some techniques are more suitable than others for specific tasks. Indeed, it is showed that ANNs, which are commonly used for multi-variable function approximation and regression, are extensively used for short term load and price forecasting, using supervised learning to achieve accurate prediction. In contrast, algorithms using RL are often used to capture human feedback, which makes them suitable for control tasks in HEMS that integrate a DR solution. On the other hand, unsupervised learning is mostly used for clustering when there is no prior knowledge of the categories, which is mostly the case for DR customers clustering tasks at aggregators level. Finally, once DR customers have been categorised and their consumption has been forecast, aggregators schedule the activation of DR participants and plan their rewards and penalties. Different approaches have been highlighted for these tasks, among which optimisation, that can require the use of nature inspired optimisation techniques (e.g. swarm intelligence), where traditional, deterministic optimisation methods are less accurate. Other approaches use multi-agent systems within game-theoretic environments to determine the optimal pricing and scheduling strategy.
Our work also showed that this growing interest of the research community for AI solutions in the DR sector is also felt in the industrial sectorwhere numerous start-ups have been created in the last few years have adopted the same trends highlighted above. Nevertheless, even if these trends for the use of AI in DR are well established, more research is clearly needed to identify the optimal solutions in many cases. Indeed, many of the proposed solutions lack testing and validation through real-life trials and experimentation conducted at large scale. Hence, additional research initiatives along with industrial projects and large-scale experimentation are still necessary to allow the emergence of more accurate models and AI solutions. This path will allow AI/ML techniques to become mainstream or become business-as-usual in the energy DR sector.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. See Table 1, Table 2, and Table 3.

Table 1
Summary of the papers reviewed.

Ref
Year Method(s) Objective(s)

Scheduling and Control of Loads for DR
(continued on next page)