Data analytics in the electricity sector – A quantitative and qualitative literature review

Data Analytics applications and methods. In addition, we discuss used data sets, feature selection methods, benchmark methods, evaluation metrics, and model complexity and run time. Summarizing the ﬁndings from the diﬀerent areas, we identify best practices and what researchers in one area can learn from other areas. Finally, we highlight potential for future research.


Introduction
The ongoing decarbonization, decentralization and digitalization of the electricity sector drive the importance of Artificial Intelligence in general, and Data Analytics in particular. On the one hand, push factors , such as the declining costs of information and communication technology as well as the advances in computing power, lead to an increasing availability of data and new opportunities for its analysis. On the other hand, pull factors , such as the increasing volatility of electricity generation due to a growing share of renewable energy sources, and a rising number of active actors in the electricity system, increase complexity and create new needs for Data Analytics.
Driven by these new opportunities and needs, numerous new methods and fields of application are emerging, and research is becoming more specialized and fragmented. Data Analytics studies today span dif- Table 1 Literature reviews in sub-domains of Data Analytics in the electricity sector.

Study
Area-specific Application-specific Approach-specific [1][2][3][4]11,12] X [5,6] X [7][8][9] X [13][14][15][16][17][18][19][20][21][22] X X [23] X X -X X [10] X X X across fields. In order to enable researchers and practitioners to apply and advance state-of-the-art methods effectively in the future, it is important to integrate and structure the comprehensive body of existing scientific work. This calls for a holistic review across all important areas, applications and approaches of Data Analytics in the electricity sector. Therefore, we attempt to make two key contributions with this paper: 1. We quantitatively capture the big picture of Data Analytics research in the electricity sector, thus displaying the high-level status quo of research activity. 2. We qualitatively analyze over 200 high impact studies in-depth, thus laying out the inner mechanics of current Data Analytics research, identifying best practices from different areas, and deriving suggestions for future research.
For our qualitative analysis, we review in detail the used data sets, feature selection methods, evaluation metrics, benchmark methods, and state-of-the-art methods of each reviewed paper. We acknowledge existing sub-domain literature reviews where they exist, and reference them as a source for additional valuable information for the interested reader.
Subsequently, this paper proceeds as follows. Section 2 provides definitions of the dimensions along which studies in this review are categorized. In Section 3 , we describe the methodology used for searching and selecting the most relevant literature. The results are presented in two ways: Section 4 presents a quantitative analysis, delivering a high-level overview of the landscape of Data Analytics research in the electricity sector. Section 5 then provides a structured in-depth review of the most influential studies. Finally, in Section 6 we summarize the review findings, derive best practices, and outline potential key trends for future research.

Definitions and dimensions of analysis
We structure our review along three dimensions: area, application and approach , which we describe in more detail in the following.
Area . The electricity system value chain is composed of multiple components. The present study derives the following categories: (i) Generation, (ii) Trading, (iii) Transmission and Distribution, (iv) Consumption, and (v) System. Generation is the production of electric energy carried out in power plants, while trading refers to the buying and selling of electricity on wholesale markets. Transmission and Distribution denotes the delivery of electricity via grids. Consumption is the demand and end-usage of electricity. Studies that contemplate the system as a whole and simultaneously assess multiple areas are grouped in the System area.
Application . This study defines application as the specific task or activity on which an investigation focuses. Based on typical applications from Data Analytics literature, four categories are defined: (i) Forecasting and Prediction (Supervised Data Analytics), (ii) Clustering (Unsupervised Data Analytics), (iii) Monitoring and Controlling (both supervised and unsupervised), and (iv) Other. Forecasting and Prediction are both concerned with the estimation of outcomes for unseen data in the future. In addition, because the terms 'prediction' and 'forecasting' are used as synonyms by many authors, the first category contemplates both applications. Clustering, on the other hand, is the aggregation of objects into homogeneous groups. As for Monitoring and Controlling, both terms are related and involve a process of observation and measurement of performance in order to take corrective action if necessary.
Approach . To compress the exceptionally large amount of single and combined methods existing in Data Analytics research, this review defines eight groups of approaches that represent the third perspective of analysis of each reviewed paper: (i) Time Series, (ii) Regression, (iii) Neural Networks, (iv) Support Vector Machines, (v) Tree-based Approaches, (vi) Clustering Approaches, (vii) Hybrid Approaches, and (viii) Other Approaches. In addition, other literature reviews belong to another category (viiii) due to their different investigative objectives. This study categorizes an approach as Time Series if it falls into one of the following families: autoregressive integrated moving average (ARIMA), generalized autoregressive conditional heteroskedasticity (GARCH), Kalman filtering (LQE), Grey system theory (GST) and exponential smoothing or transfer functions (TF). Regressions can be defined as an approach used to identify a relationship between the explanatory and the dependent variables [19] . Apart from the support vector regression (SVR) and the regression tree, all types of regressions -including linear, logistic, logic, and quantile regression -belong to this category. Artificial Neural Networks (ANNs) are Machine Learning approaches inspired by cells in the brain. Similar to brain neurons, artificial neurons are connected with each other in multiple layers, forming a network [19] . The network can adopt multiple architectural forms, which we summarize under the term ANNs. Support Vector Machines (SVM) are a Machine Learning approach for classification and regression problems [19] . When used for regression, it is known as SVR. Tree-based approaches function by developing a tree to predict an outcome from input variables. They can be used for classification and regression. Related approaches are,e.g. random forests, boosting and bagging as well as Extra Trees. Clustering approaches aggregate objects in homogeneous groups, in other words, clusters. Two clustering families exist -hierarchical and partitioning approaches. We categorize an approach as a Hybrid if it combines two or more approaches from the classes defined above. This excludes models which use a second approach only for pre-processing. If an approach cannot be allocated to any category, it is defined as Other Approach. Fig. 1 gives an overview of the relationships among these three dimensions, together with examples. A typical study in our review uses real-world data from an area, introduces a certain application use case and presents one or more approaches. The results give new insights on both the respective area and the performance of the approach.

Methodology
In order to identify the main streams of relevant literature, we follow the fundamental three steps suggested by Webster and Watson [24] : (1) Identify major contributions, (2) search backwards, and (3) search forward. The scope of the present review is very broad compared with other review articles. Therefore, we enhance the conventional first manual step of identifying major contributions with a database query search and automatic filtering with data mining. Our methodology is presented in detail below to ensure transparency and validity. Fig. 2 gives an overview of the steps described in this section.

Selection of initial paper pool
The starting point for identifying literature for the review is a manual selection of highly relevant papers. The selection is performed with the help of experts in the field, taking into account the number of citations of a paper and the journal rank in terms of h-index and impact factor. The result of this step is the initial pool, consisting of 50 studies. 1

Evaluation and selection of most appropriate database and query
The second step in capturing high-impact literature for the review is an online database search. To this end, we evaluate different databases and search queries, and select the one best-suited to the purpose. For database selection, the deciding factor is the number of studies of the initial paper pool it contains. This number must be maximized. We evaluate the established databases Web of Science, Scopus, Science Direct, IEEE Xplore, and Wiley Online Directory. We select Web of Science, because it is the database which contains the highest number of studies, i.e. 44 of the 50 studies listed in the initial pool.
Next, a search string is constructed that searches the titles and abstracts of all articles in the respective database. When constructing the search string, three aspects are taken into account: the consistency of the query, the number of papers of the initial pool found with it, and the total number of papers retrieved by it. After assessing 10 different queries, a query that best balances the three aspects is selected.
The search string is composed of four parts, which are linked with the logical AND. The first part of keywords refers to the general object of analysis in a paper, such as electricity . The second part refers to the area or subtopic, for instance transmission . In the third part, the keywords refer to the applications of the study, e.g. load frequency control (LFC) . Finally, the fourth part consists of approaches that might be used, such as neural networks . The keywords within each part are linked with the logical OR. 2 (electric * OR energy OR power OR load OR radiation OR "smart meter$ " OR lines OR voltage) AND (customer$ OR consum * OR demand OR generation OR transmission OR distribution OR retail OR "short term " OR "long term " OR loss * OR stability OR system$ OR solar OR price$) AND (cluster * OR segment * OR forecast * OR predict * OR detect * OR analy * OR simulat * OR applicat * OR implement * OR monitor * OR control * OR characteriz * OR "LFC ") AND (technique$ OR model OR data OR "artificial intelligence " OR "learning machine " OR "machine learning " OR "time series " OR "regression analysis " OR "decision tree " OR "neural network$ " OR "ANN " OR "support vector " OR "deep learning " OR "data mining " OR "ARIMA " OR "ARMA " OR "ANFIS ") The search is performed using the selected string on the chosen database in February 2019. In total, 7708 papers are retrieved.

Automatic filtering
Of the retrieved articles, those most relevant and suited are identified using a text mining algorithm. The algorithm's goal is to determine the most relevant documents in relation to the given search query. It is implemented using the programming language R.
First, the search string is disaggregated into a list of 20,834 queries that contains all possible combinations resulting from the selection of one keyword per category block of the aggregated query (4 keywords for four blocks). Second, a Vector Space Model is constructed, using the disaggregated search strings and the abstracts of the 7708 documents retrieved in the previous step. The Vector Space Model is an algebraic model that involves two steps: the representation of each document as a vector of the words that occur within it, and the transformation of the vectors into a numerical format. When breaking the documents into vectors, preprocessing steps are applied in order to remove stop words, numbers, any extra white spaces and punctuation, and to reduce the remaining words to their word stem. For the second part of the Vector Space Model, a Term Document Matrix is constructed. This is a method of representing document vectors in a matrix format, where rows stand for all the terms present in at least one of the documents, and columns represent the document vectors across all terms. In this case, a cell value in the matrix is filled with the number of times the particular term is present in the particular document. If the term is not present in the document, then the cell value contains the number 0.
We define articles as relevant when they have a high similarity to the search string. Because documents and queries are represented as vectors, the angle between the vectors can be used as a similarity measure. The cosine similarity between two documents on the vector space is a measure that calculates the cosine of the angle between them, according to 1 .
This metric is a measure of orientation and not magnitude, since it focuses on the angle between the documents, and not the magnitude of each word count. In this sense, the cosine similarity is advantageous because even if two similar documents are far apart according to the Euclidean distance -due to the difference in size -they will still be grouped close together. However, a document containing the words from a string vector several times will not be closer to that vector than a document with the words appearing just once.
After the calculation of the cosine similarity between each paper vector and each search string vector, each document is assigned its highest obtained score, i.e. the highest cosine similarity obtained with any of the string vectors. As a result of this first step of filtering, the top 1000 papers with the overall highest similarity score are selected.

Abstract filtering
A second manual step of filtering is performed by reading and evaluating the abstracts of the top 1000 documents. First, we exclude studies which use only physical or engineering methods. In addition, we rule out studies that cover the application of Data Analytics in an energy sector that does not include electricity such as natural gas. Following this step, 514 papers remain, which form the quantitative analysis pool depicted in Fig. 2 used to carry out the quantitative analysis of electricity analytics research in Section 4 . The pool is later refined for the qualitative analysis, as described in the paragraphs below.

Manual filtering
In order to conduct a qualitative analysis of the studies within the scope of this review, a more finely-tuned pool of literature is needed. With this objective, the papers are grouped by area and year. Within each group, they are then ordered according to their number of citations. The amount of studies to select from each group is defined according to the proportion that each group represents in the quantitative analysis pool. The grouping thus has two purposes: to control the influence that the year of publication has on the number of citations, and to ensure that the proportion of articles in each area remains the same as before. Based on these criteria, the documents with the highest number of citations are selected from each group. Following this second step of filtering, 147 studies remain.

Backward and forward search
To ensure that the most relevant literature is analyzed, backward and forward searches are conducted. The backward search is the revision of papers cited by the articles that are currently part of the literature list, thus determining prior studies that should also be included. The forward search, on the other hand, is the identification of papers that cite the articles that are included in the literature list, thus determining subsequent studies that should be included.
As part of the backward search, all papers that are cited by at least 10 of the articles on the current literature list are included. Following this step, there are 9 new studies on the list. For the forward search, papers that cite the articles on the current literature list, and have an  China  3  India  United Kingdom  4  Iran  Germany  5  Turkey  Japan  6  Spain  France  7  Taiwan  Canada  8  Korea  Italy  9  Australia  India  10 Malaysia Spain above average number of citations in relation to them, are included. In the course of this step, 16 new studies are added to the list. Finally, the literature list is merged with the initial pool, excluding duplicates. The resulting qualitative analysis pool includes a total of 205 studies which are reviewed in Section 5 .
It should be noted that, due to the broad scope of the attempted review, we concentrate on the most important studies and fields of research with the highest impact. Other studies related to Data Analytics in the electricity sector exist, but are not at the center of the research focus.

Quantitative overview
The paper pool with 514 studies, obtained after the abstract filtering explained in Section 3.4 , is used for a high level overview of publications.

Literature growth
The body of Data Analytics-related work in the electricity context has been growing substantially and many publications on the subject exist. We compare the development to the trend of scientific publications in other major fields by using data from the portal SCImago Journal & Country Rank [69] . Among other things, the portal classifies the information contained in the Scopus database from 1996 into subject categories. The comparison of the literature growth across different topics is presented in Fig. 3 . From Fig. 3 , it is apparent that all topics together have seen growth in the number of annually published papers. Starting around 2005, papers published in Artificial Intelligence outpace those published across All topics, Energy and Computer Science . This suggests that Artificial Intelligence is a topic that has received attention not only from the field of computer science, but persistently from the entire scientific community. The field of Data Analytics in the electricity sector received less attention than other fields, before 2010. Since then, its growth has become more pronounced than in any other field, surpassing even the Artificial Intelligence -related literature. This underlines the importance of a literature review in this field.

Country-Level analysis
Having established the increasingly high degree of interest in the topic of this study, we now explore the countries most interested in this research. The top 10 countries, ranked by total number of publications, are compared for All topics and Data Analytics in the electricity sector . The information for publications across All topics is again based on the SCImago Journal & Country Rank portal [69] . The results are presented in Table 2 .
These results suggest that there is a special interest in research on Data Analytics in the electricity sector in Asian countries -seven of the top ten countries in this subject are from Asia. Furthermore, the majority of these countries have a special focus on this topic, since they play a major role here, but are not included in the list of top countries ranked by the  overall research output. On the other hand, in several countries -most of them European -research on Data Analytics in the electricity sector is under-represented compared with the overall number of publications, specifically in the United Kingdom, Germany, Japan, France, Canada, and Italy.
Another aspect of the country-level analysis is the amount and weight of collaborative work being published. Fig. 4 presents a collaboration network, where the width of the edges stands for the amount of collaborations between two countries. For the sake of readability, only countries whose work in collaboration with others represents at least 20% of their total research output are taken into account. The country codes are those defined in ISO [70] .
When the network is examined, it becomes evident that China is at the center of the collaborative work in the research related to Data Analytics in the electricity sector . The United States also plays a major role, as several countries are included in the network because of their strong collaborative work with researchers from US-American institutes, such as Ecuador and Colombia. Finally, other countries with many strong ties are Australia, the United Kingdom, Taiwan, Canada, Iran, and Malaysia.

The three A's: Area, application, approach
Each of the studies reviewed in this paper is categorized along the three defined dimensions: area, application, and approach. Fig. 5 displays the proportion of applications used in the different areas.
The first essential finding is the importance of the Consumption area in the research topic Data Analytics in the electricity sector . In fact, more than one-third of the literature reviewed corresponds to this area. The  second is the Generation area, followed by Transmission and Distribution, Trading , and System .
When considering the applications among the areas, Forecasting and Prediction plays the most important role across the electricity value chain. Control and Monitor is also present, but plays a considerable role only in the Transmission and Distribution area and in the System as a whole. Moreover, Other applications are the focus of the System area. This can be explained by the fact that studies that do not work in a specific area, usually perform simulations and modeling before anything else. Finally, the application Clustering is rarely the main object of the studies but plays a minor role in the Consumption sector. However, clustering is often conducted as a pre-processing step within the studies.
Moving on to the analysis of the approach category, Fig. 6 shows a shift in the methodological focus across the years. The family of neural networks has played a major role since 1990 and still does in absolute numbers, but its share has been decreasing to give way to other methods. Three similarly high-cited reviews on neural networks along the energy value chain have been published by Kalogirou in that time [7,8,71] . He defines sub-areas that can be tackled by neural network and describes one corresponding paper for each sub-area. Over the last few years, a range of approaches has emerged, with hybrid being the most frequent. This suggests that there exists a propensity to merge different techniques in an attempt to achieve better results. Other approaches that are gaining attention are tree-based and clustering methods. A deeper evaluation of the approaches used for each area and application is conducted in Section 5 .

Programming languages
Based on the programming languages and statistical software that the studies in the Qualitative Analysis Pool report, their popularity in Data Analytics research can be observed. Fig. 7 depicts the numbers of reported languages, aggregated into three periods from 2005 to 2019. In all periods, MATLAB was the most popular language, used by over 50% of the studies. Yet, both Python and R are becoming increasingly popular in recent years. In the period from 2015 to 2019 they were used by 20% and 11% of studies, respectively. The share of "other " programming languages and software has been decreasing. That group includes Java, SPSS, Eviews, Rapid Miner, SAS, Excel, Microsoft Visual Basic, Minitab, LabView, and Weka, which are all named one to three times in the period from 2005 to 2019. It is notable that only about half of all studies report the used programming language (43% in 2005-2009, 50% 2010-2014, and 53% in 2015-2019).

Qualitative review
We structure our review along the area dimension. Each area section starts off with a concise overview of the most relevant applications. For each application, we then examine the research in more detail. First of all, the used approaches, data sets and features are described. We then present benchmarks and their role in the respective area and application. We also discuss typical error measures, as well as the complexity and run time of approaches. Finally, notable state-of-the-art approaches are portrayed and a summary is given.

Generation
The volatility of solar and wind power represents a new challenge for successfully balancing supply and demand in electricity grids. The rising penetration of renewable electricity generation has therefore made accurate forecasting of the resources used a key topic of research. This is mirrored by the studies in our final pool. All but two studies in the generation area are affiliated with Forecasting . These studies are further classified according to type of generation. Solar forecasting studies make up 60% (25 publications) of the pool, wind accounts for 33% (14 publications), 5% (2 publications) relate to two or more generation types, and one paper [72] is exclusively related to hydro power. Within wind and solar generation, we identified several reviews [5,37,62,66,[73][74][75][76][77][78][79][80] .
In the case of [9] , the author also reviews examples of solar radiation predictions, as well as the modelling and forecasting in energy engineering systems in general, but he focuses exclusively on the application of ANNs. The two studies which are not related to forecasting, cover the topic of solar power control. The authors of Hiyama et al. [81] use a ANN to learn the optimal operating voltage of a PV system. This optimal value is then fed back to the PV inverter to adjust the terminal voltage of the system. In Chia et al. [82] the authors use a support vector machine to control the energy flow. The study is further reviewed in the context of Section Transmission and Distribution. Approach overview The output of solar electricity systems mostly depends on solar radiation. Hence, related studies focus on the forecasting either of solar radiation or of photovoltaic (PV) power. Specifically, 27 papers contain a prediction of solar radiation. When examining the approaches used within this group, ANNs and variations of this method are the most popular [32,56,74,[83][84][85][86][87][88][89][90][91][92][93][94][95][96][97][98] . SVMs are the second biggest group [82,99,100] . Tree-based approaches can be found in [66,89] . The authors of [55,73] use Times Series methods. Hybrid approaches as presented in [88,101] are a combination of more than one approach into a new model. For example, the author of [88] combines an ANFIS-model with a particle swarm optimization (PSO), a differential evolution (DE), and a Genetic Algorithm (GA). The group of publications focusing on wind energy includes fourteen papers. It is noticeable that within the studies of wind generation, hybrid approaches are more common than for solar generation [102][103][104][105][106] . However, ANNs also play an important role within this group [32,84,85,[107][108][109] and an SVM is applied in [110] . Data sets Most analyses are built on single data sets. The biggest share of used data sets originates in the USA (7), followed by China (6), France (3), India (3) and Australia (3). The time span of these data sets ranges from several days [91] to over 100 years [100] . Yet, most of the data sets include several months up to a few years. The time interval usually extends from hours to months. An outlier in this group is [93] ; their data set consists of 48.000 single data points which represent 1 minute of PV power output values. The split between test, validation, and training set is ambiguous and differs between the studies. A trend can nonetheless be identified: newer publications tend to use a training set of over 70% [83,[88][89][90]104,110] . Feature selection In relation to the input features, wind and solar power forecasting studies must be differentiated. The former group focuses mainly on the forecasting of the power generation of wind turbines. Only a few studies predict both power generation and wind speeds [84,105,108] . The authors of [84] and [108] develop a two-stage forecasting model, where the wind speed forecast is used as an input feature of the power generation forecast. The proportion of used features within wind power forecasting studies is presented in Table 3 . It shows a 'Other Weather-based Data' include: temperature, relative humidity, and weather prediction the importance of the variables wind power and wind speed -half of the studies do not consider any other input feature in their models. Few publications use statistical methods for feature selection. The authors of Mabel and Fernandez [32] conduct a correlation analysis in this respect. As a result, they suggest that besides wind speed, relative humidity and generation hours are important parameters influencing wind power generation. However, the evaluation of input features for wind power forecasting models is still an area with potential for future research.
Regarding the group of studies investigating solar power forecasting, one difference from wind power forecasting is that the vast majority of papers focuses on solar radiation as a moderator for solar power generation and not on solar power forecasting itself. However, the input features used between studies focusing either on radiation or on power are similar -both use historical data of the output variable as input.
Some contradictions in feature evaluation in solar forecasting can be observed. In Deo and Ş ahin [86] , the authors obtain the best performance for the combination of extraterrestrial radiation and daily temperature, whereas the authors of Yadav et al. [96] suggest that extraterrestrial radiation is one of the least influencing input features. Both studies agree that temperature information should be considered, because it represents one of the most important variables for accuracy enhancement. In general, studies agree that meteorological information improves the performance of the models (e.g. [56,91,101] ). Benchmark approaches All publications in this area, except for eight, compare their model with other models. Across all papers, there is no common benchmark standard. Generally, the benchmark methods can be divided into three major groups. Firstly, some authors compare their model against one from another category of approaches. For example, the authors of Halabi et al. [88] check the performance of their AN-FIS models against SVM models. Second, some authors compare their model to a model within the same approach category. For example, the authors of Yadav et al. [96] use different ANN models as a benchmark for their generalized regression neural network (GRNN) and radial basis function neural network (RBFNN) models. Third, some publications describe and design more than one approach in detail, (e.g. different variations of an ANN) and benchmark them against each other. The authors of Wang et al. [90] compare the performance of their RBFNN, GRNN, multilayer perceptron (MLP), and Empirical Improved Bristow- Campbell model against each other. Also, the authors of Mason et al. [85] train their recurrent neural network (RNN) and benchmark the results of each of the seven different algorithms against each other. Evaluation metrics We cannot observe a consensus regarding the measures for evaluation. The most popular is root mean square error (RMSE), followed by mean absolute error (MAE), and then by mean absolute percentage error (MAPE). More than 50% of the publications use one or more of these measures to evaluate their results. An overview of all measures is presented in Table 4 .
RMSE and MAE are easy to interpret as they are based on absolute values. However, this also means that they complicate comparison between data sets and they assign high weights to large absolute errors, which might in some use cases be unwanted. Both measures can be useful for comparing different models used on the same data set. These absolute metrics can be complemented by relative measures such as MAPE. This also enables comparisons across data sets. We would like to point out that MAPE is biased in favor of models which underforecast [111] , and therefore recommend not to rely on a single error measurement, but to use a combination of absolute and relative error measures. In addition, we suggest including biased errors such as the mean bias error (MBE) to check for systematic errors in the forecast model. Complexity and running speed Only a few authors provide information about the complexity and running speed of their models. The computation setup is also rarely provided. The authors of Ramli et al. [95] state the running speed of their SVM and compare it against an ANN. The authors of Halabi et al. [88] also provide the execution time for each ANFIS Model to evaluate their performances. In real-world use cases, the selection of the best-suited approach can depend on running speed and costs of computation, especially as models are getting more complex. We suggest providing the running speed along side with the corresponding computational setup as best practice. Notable approaches First, multiple studies employ notable ANN-based approaches. The authors of Jursa and Rohrig [109] give an exceptional overview of the performance of different approaches and combinations of them. They compare PSO, DE, ANN with back-propagation (BP), a Nearest Neighbour Search approach (NNS), as well as combinations of this. The authors evaluate the performance of each approach for wind power forecasting for ten wind farms in Germany. ANN-PSO provides the highest accuracy. However, even better results are obtained when using the mean model output of ANN-PSO and NNS-PSO; namely this causes a reduction in the error of 10.75%.
Second, Bhaskar and Singh [84] develop a two-stage forecasting approach, where the wind speed is predicted first and then used as a basis to forecast the wind power output. For both stages, a feed forward neural network (FFNN) and an adaptive wavelet neural network (AWNN) are evaluated and compared. AWNN is used for wind speed forecasting because of better approximation and faster training ability compared to FFNN. However, for the second stage, a FFNN is selected. The approach is then evaluated against two naive approaches and the results confirm the higher accuracy of the proposed model, obtaining an average normalized MAE of 7.08% and an RMSE of 10.22%.
Third, hybrid approaches have achieved promising results in this area [103,[112][113][114] . The authors of Wang et al. [112] develop a hybrid probabilistic approach based on wavelet transformation (WT), a convolutional neural network (CNN), and an ensemble technique, which is tested using historical time series of wind farms in China. The proposed approach is superior to the benchmark approaches -persistence, back propagation with quantile regression, and SVM with quantile regression -in terms of average coverage error and interval sharpness for all examined seasons and time horizons. The authors of Yuan et al. [103] develop a model based on a least squares support vector machine (LSSVM) and gravitational search algorithm (GSA) to forecast the short-term wind power. Compared to the single approaches, FFNN-BP, SVM, LSSVM, and the hybrid method SVM-GSA, the proposed model shows better performance concerning all used evaluation metrics. The absolute error of the proposed model is less than 3% and its correlation coefficient is 0.9087. The authors of Xie et al. [114] combine an Optimized Discrete Grey Model (ODGM) for forecasting total consumption amount with a Markov model for trends of energy generation structure. The authors of Alessandrini et al. [113] compare two probabilistic hybrid approaches, namely ECMWF-EPS (Ensemble Prediction System in use at the European Centre for Medium-Range Weather Forecasts) and COSMO-LEPS (Limited-area Ensemble Prediction System developed within COnsortium for Smallscale MOdelling). Summary Although hydropower is the most used renewable energy in the world ( [115] ), the current academic focus lies on solar and wind generation forecasting. The main reason for this may be the high volatility of solar and wind power which creates unique challenges of integrating it into the electricity system. Hydropower on the other hand is often controllable, like other forms of conventional power generation. In the future, Data Analytics research may also address other volatile sources such as wave energy [72] .
Apart from the differences described in the factors influencing the performance of the models, some general aspects can still be established regarding the approaches. First, ANN is the dominant method used within the forecast of both wind and solar generation types. Particularly in the area of solar power forecasting, a wide range of ANN variations are applied. However, in wind power forecasting studies, this approach is more often seen in combination with other techniques then by itself. In many cases, these hybrid approaches outperform others, although exceptions do occur [105] .
The interest in SVM which are used in both solar and wind forecasting has been increasing over the last few years. Although not used as frequently as ANNs, this approach shows good performance, even surpassing ANNs in some cases for solar radiation prediction [95] , and in the context of wind power [103] . Furthermore, as mentioned above, it seems that SVMs perform well when integrated in hybrid approaches. Tree-based approaches have emerged recently, providing very satisfactory results. In various solar radiation forecasting studies, tree-based approaches have outperformed ANNs [89] and also SVMs [99] . As established by Hong et al. [5] , several of the best-ranked teams in the GEFCom2014 forecasting challenge, both in the solar and wind power forecasting category, developed a tree-based approach. This suggests that this direction should be explored further in this context.
Probabilistic forecasting of wind power [5,107,112] and solar power [5] has gained attention in recent years and has been shown to deliver promising results.
Further improvements in forecasting performance can be achieved by conducting thorough feature selection with statistical methods. Lastly, studies can demonstrate the practical usefulness of their approaches by evaluating error metrics in combination with computational time and modelling effort.

Trading
Thirty-one studies from the final analysis pool are classified in the Trading category. All of these studies deal with price forecasting. Accurate forecasting of electricity prices concerns all market participants, including generators, utilities and power brokers, since it is crucial for developing bidding strategies and for making strategic, tactical and operational business decisions.
Price forecasting is typically classified according to time horizons, in other words, in short-term (STPF), medium-term (MTPF) and long-term price forecasting (LTPF). In line with Weron [67] , we classify studies with a forecasting horizon of up to a few days as STPF, studies with a horizon from a few days up to a few months as MTPF, and studies with horizons of several months and longer as LTPF. From all studies taken into account, 30 focus on STPF and one focuses on MTPF. Fig. 8 presents the approaches used for price forecasting of the reviewed studies. On the left, the proportion of all approaches used is shown. On the right, the proportion of approaches used in the context of hybrid approaches is shown.
The studies are further classified with regard to the pricing type used. The most common type in the literature is the system-wide Market-Clearing Price (MCP) with 23 studies. In addition, seven studies address forecasting of Locational Marginal Prices (LMP), also called Nodal Prices . Because LMPs heavily depend on location, LMP market systems carry a higher level of complexity and therefore greater transaction costs [11] . This is one reason why the majority of medium-sized markets calculate a price for the entire system and LMPs are usually computed only for major markets [67] . Approach overview Fig. 9 presents the absolute number of studies differentiated by approach for LMP and MCP forecasting publications. It highlights the importance of ANN, since the approach is used by the majority of studies [116][117][118][119] . One study employs an approach based on Time Series [120] , and three employ a Hybrid approach [40,90,121] . For MCP forecasting, ANN-based methods also play a major role [28,33,40,46,49,59,61,117,119,[122][123][124][125][126] . Other relevant approaches are Time Series [35,52,127,128] , hybrid approaches [106,121,125,[129][130][131] , and SVMs [132] . Data sets Most commonly, studies use electricity price data from Spain (11 articles), PJM (7), California (5), or Australia (5). Studies which apply the same approach to different data sets have not found remarkable differences between MCP and LMP forecasting [117,121,126] . Feature selection The input features used vary from study to study. Since the performance of a model is strongly related to the inputs used, it is difficult to compare two approaches that do not use the same variables as input. Table 5 presents the proportion of input features used within the examined pool of Trading literature.
More than half of the studies consider only price or price and demand as input variables. Since the selection is often made based on the experience of the forecaster, many authors agree that the optimal choice of features should be a focus of future research [28,59,67,123] . For instance, the authors of Singhal and Swarup [59] suggest generator availability and bidding strategy as potential features. In general, for low-price volatility, even considering only the prices of similar days has provided adequate results [129] . However, from time to time prices exhibit sudden jumps reaching extreme levels, making it difficult for forecasting models to remain accurate if they rely on a combination of historical prices, demand, and even time indices alone (e.g. [59] ). These price spikes are often attributed to unexpected increases in demand, shortfalls in generation, and failures in the transmission or distribution lines [133] . Consequently, an understanding of the factors contributing to the occurrence of extreme prices, together with a careful selection of input features for the models, could improve their accuracy. Two studies in our pool concentrate particularly on feature selection. The authors of [132] explicitly select input features in a recursive manner: Starting with the default input data of hourly electricity demand, the next potential element is added to the model to test whether the accuracy is improved with it or not; The next potential input is then tested, and so on. The final selected features for the model are hourly electricity demand, daily peak electricity demand, monthly average electricity demand, daily price of natural gas, previous years' monthly average electricity MCP, and time codes. The authors of Bento et al. [121] reduce the input feature vector to hourly price data of the three most similar days and of the six days prior to the current day. This reduction is conducted to achieve acceptable computational costs, which results in a running time of less than 30 minutes for each forecast. Benchmark approaches All recent studies employ several benchmarks. It can therefore be considered best practice to do so. For exceptionally extensive comparisons of state-of-the-art approaches, we refer to Bento et al. [121] and Lago et al. [49] who include time series, hybrid, and other machine learning approaches. Evaluation metrics Price forecasting studies apply a variety of error measures. The proportion of error measures used within the analyzed studies is presented in Table 6 The problem with the popular measures based on absolute errors -MAE, RMSE and RMSE-based -is that they make it difficult to compare results between different data sets. For this purpose, relative error measures such as MAPE are more appropriate. However, MAPE has other drawbacks. If actual price values are close to zero, MAPE becomes very large, and even undefined for actual prices of zero. To overcome these disadvantages, some authors substitute the actual price for the average of actual prices [33,35,[116][117][118]131] or for the median of actual prices [122] .
Alternatives to MAPE are the mean absolute scaled error (MASE), and the symmetric mean absolute percentage error (sMAPE). Only few studies have used these measures in this area, e.g. [124] and [49] . We recommend the use of MASE, because in addition to not relying on the division by the actual price and to the possibility of comparing the measure across data sets and scales, it penalizes positive and negative errors equally. It can also be easily interpreted [134] . Complexity and running speed Not all studies provide useful insights into the computational complexity of their approaches. Only a minority of studies states the computational setup used. Besides, some authors report the total computational time needed -pre-processing steps included -and others mention only the run time of the forecast. The authors of Anbazhagan and Kumarappan [119] showcase the relevance of computational complexity for model selection very well. The study uses the Elman network variant of RNN to forecast LMPs. Results show good accuracy, with an average weekly MAPE of 3.82%. However, the proposed model is slightly outperformed by some of the hybrid benchmark models; specifically, by a WT and hybrid of neural networks and fuzzy logic (WNF), a wavelet-ARIMA-RBFNN, a cascaded neuro-evolutionary algorithm (CNEA), and an adaptive-network-based fuzzy inference system (WPA) model. However, the RNN's average computation time is about 650 milliseconds, whereas for the hybrid approaches the values are 5 seconds, 5 min, 40 min, and 1 min, respectively. Considering this trade-off between accuracy, computational time, and complexity, RNN is named the best choice. Notable approaches For LMP forecasting, neural networks are used by the vast majority of authors. Classic FFNN-BP are applied by Mandal et al. [116] and Vahidinasab et al. [118] . The results suggest that ANNs can capture the non-linear behavior more precisely than traditional time series approaches. Other authors develop more advanced versions of neural networks. The authors of Pindoriya et al. [117] combine the classical FNN with wavelet theory into an AWNN model, which provides higher accuracy than GARCH and MLP. Furthermore, the incorporation of load demand further improves the accuracy of LMP forecasts. The authors of Bento et al. [121] use a bat algorithm (BA) for parameter selection, WT as a pre-processing step to decompose price time series obtaining a stable variance and less outliers, and a combination of BA and a scaled conjugate gradient (SCG) algorithm for training an ANN. An RNN with Elman architecture is applied by Anbazhagan and Kumarappan [119] and Hong and Hsiao [40] . In the former work, the proposed model is selected as the best choice when a trade-off between accuracy, computational time, and complexity is considered. In the latter work, results show the effectiveness of RNN against the MLP on the one hand, while on the other hand indicate the improvement on the performance of the model when it is combined with FCM for clustering the data. As the only non-ANN approach in our pool, Liu and Shi [120] evaluate and compare ARMA-GARCH approaches on the New England market. ARMA-SGARCH-M achieves the smallest MAE value (0.122) and in addition offers low complexity of model construction.
For MCP forecasting, the neural network's family is again the most prominent. Within this group, the classic FNN is used by the majority of authors [33,59,61,118,[122][123][124] . When compared to naive or time series approaches, the proposed models show better results. However, the traditional ANN is outperformed by other special variants of the family of ANN in the studies of [28] and [126] . The former focuses on an appropriate tool to soften the non-stationary and non-linear MCP signals using fuzzy logic, fuzzy neural networks (FNNs), and the latter applies the concept of deleting 'bad' samples for learning, as opposed to ANN which selects all samples, and SVM which only selects the 'good' ones.
After ANN approaches, Hybrid approaches represent the second largest group within MCP forecasting studies. Fig. 8 disaggregates these models into its component methods. Once again, the family of neural networks represents the major share in this group [125,129,130] . Second in importance are 'other approaches', which are not found when looking at non-hybrid approaches. This suggests that there are methods that do not play an important role by themselves in this area, but in fact play a role in combination with others -particularly with ANN. This is the case with some heuristic algorithms, such as the firefly algorithm (FA) applied by Wang et al. [106] , and the gravitational search algorithm (GSA), used by Shayeghi and Ghasemi [131] . Similarly, clustering approaches play a role in STPF when combined with other methods [40,46] . Although these studies show accurate results, the authors of Lago et al. [49] suggest that hybrid methods do not provide a better accuracy than their simpler counterparts. In addition, they often need more computational time (e.g. [106] ). Support vectors machines are also used in this context, although to a lesser extent, as part of hybrid approaches [131] or by themselves. An example of the latter group is presented in Yan and Chowdhury [132] . The study proposes a model which combines Least Squares SVMs (LSSVM) for classification and forecasting, obtaining good results, and outperforming single LSSVM, LSSVM-ARMAX, ARMAX, and SVM. The authors of Lago et al. [49] suggest that, in general, SVMs, ANNs, and tree-based approaches outperform time series approaches.
Deep learning (DL) architectures have not been widely used in this context. Apart from the studies mentioned who apply a simple version of an RNN [40,119] , only the authors of Lago et al. [49] highlight the potential benefit that deeper structures could bring in the electricity price forecasting context. The authors compare four different DL models: deep neural networks (DNN), long-short term memory networks (LSTM), gated recurrent units (GRU), and convolutional neural networks (CNN). Three of the four DL models outperform the rest, showing a very good performance and suggesting the potential of these approaches.
Although most of the papers in the literature apply point forecasts, there has been recent progress in probabilistic forecasts. The authors of Hong et al. [5] review forecasting approaches, price spike pre-processing techniques, and combined forecasting models. Furthermore, the authors of Nowotarski and Weron [22] offer guidelines for using methods, measures, and tests in the context of probabilistic electricity forecasting. Summary Although the problems explained in this section complicate the comparison among approaches, some generalities can be established.
First, time series techniques exhibit reasonably good performance when volatility is low. However, the studies found with a proposed time series model do not provide benchmark comparisons with approaches from other method families. In addition, when time series techniques are used as a benchmark, they are often outperformed by other methods (e.g. [33,46] ).
Second, hybrid approaches have gained attention in the last few years and authors report very good performance metrics. However, the performance in relation to their simpler counterparts is still disputed. For instance, the authors of Lago et al. [49] obtain better performance with a simple SVR model than with a hybrid SVR-based model. Furthermore, when time complexity is balanced with accuracy, some authors suggest that single machine learning methods are a better choice because of their smaller computational effort compared with hybrid approaches (e.g. [119] ). We argue that the potential justifies further research on Hybrid approaches, in line with Weron [67] .
Finally, in the same way, SVM and tree-based approaches show potential in the electricity price forecasting research context. Although they have not been used frequently for this purpose, in the extensive benchmark comparison of Lago et al. [49] , the approaches are part of the leading group in terms of performance.
One of the most distinct properties of the electricity price time series is its volatility. In this sense, one challenge of the models is to remain robust, even in cases of high volatility. A well-informed selection of the adequate Data Analytics approaches is therefore crucial.

Transmission and distribution
The research employing Data Analytics methods in the operation and control of transmission and distribution grid is very diverse. It is also often interconnected with other areas. This is especially the case for load forecasting. In [29,82,135,136] the authors combine load forecasting with considerations of transmission or distribution system research. For instance, the authors of Ding et al. [136] use load forecasting as a vehicle for improved distribution system operation.
The majority of papers that use analytics methods focus on the real time operation of transmission and distribution grids and develop intelligent controllers. Such control strategies are described in [137] , [57,68,137,[138][139][140][141][142][143][144][145] ] and reviewed in [53] . The latter puts a major focus on the inclusion of energy storage into load frequency control strategies. Another important stream of research in this area is failure prediction and analysis. We analyze four papers in this research stream [54,58,146,147] . In addition, three papers conduct Data Ana-lytics research on non-technical losses, usually related to energy theft [25,50,51] . A review on the topic is provided in Viegas et al. [65] . The authors find that the major missing pieces of this research stream are methods that can identify all different kinds of non-technical losses. Furthermore, they propose a typology for papers covering the issue. Cybersecurity of electricity systems is considered by Pan et al. [148] . The authors of Kang and Lee [149] use empirical data to assess the reliability of demand curtailment offers. Such an empirical data set is rarely seen in this research direction. In [150] , a corresponding literature review on load shedding is provided. In [151] the authors predict future grid congestion to ensure a stable grid operation. The deployment of power plants for congestion management is considered in [60] . Finally, the author of [152] describes the operation of a robot for power line maintenance. Altogether, we consider 30 papers in total in this area. Approach overview The approaches used in this area are as diverse as the research directions themselves. The developed control strategies almost always rely on neural networks. In 10 out of 12 considered papers the authors use some form of ANN. In [140] , the authors do not necessarily propose a new control method but more a paradigm that could reduce training times using a SVM. In [144] , a method is proposed to improve neural network-based controllers. The stream of failure prediction and analysis is based on ANNs except for [54] where martingale boosting is used. The research on non-technical losses taken into consideration, is always based on SVMs and the research on load forecasts is always based on ANNs. Finally, the remainder of research is very diverse and so are the employed approaches. Most use different forms of neural networks or support vector machines. The only unsupervised approach is used in [149] . Here the authors employ a k-nearest neighbors and a k-recent model. In [148] , the authors use common path mining to identify intrusions into the cybersystem of a utility. Feature selection The control strategies are always developed based on technical system variables such as the phase angles or voltages. Generally, it is often not explained why the chosen features are relevant, making the feature selection seem arbitrary. Some papers do not describe the used features in detail at all. This is similar for failure prediction and analysis. The authors use technical variables of the system to describe a fault. It might also be interesting to include more external variables and describe the feature selection in some more detail. In both research streams, the feature data is almost exclusively based on simulations instead of empirical data. This is understandable as it allows certain conditions in the network to be simulated which might otherwise be dangerous or rare. However, it would be important to develop an empirical data set that can be used to test the respective approaches. It is always difficult to extract interesting aspects from a simulated environment where cause and effect are so easily connectable. This is different from the literature on technical losses where authors always use some form of empirical data. However, the data sets lack information on what actually induced non-technical loss and are therefore somewhat limited. In the research stream of load forecasting, the authors of [136] are the only ones to use data beyond autoregressive or technical features. While this is common practice in the load forecasting area, it seems less common in the system operator area. This might be caused by a more technical perspective on the problems. From the remaining research, only Staudt et al. [60] and Kang and Lee [149] use empirical features. In summary, future studies should focus on the use of more empirical data, as up to now, simulated data has been predominantly used. Benchmark approaches For control strategies, the developed approaches are commonly benchmarked against the performance of a regular proportional-integral-derivative (PID) controller. In other cases, the developed approaches support this controller. In the remaining literature, one common benchmark is not necessarily used. Given that most research does not rely on common data sets, it is more difficult to choose appropriate benchmarks. The only paper that actively references another paper as a benchmark is [148] . Some papers do not use benchmarks at all. Evaluation metrics Evaluation is not performed homogeneously. Interestingly, some papers from the research stream of control strategies do not use any specific metric, but rely on visual evaluation of their results. However, metrics such as the integral of the time multiplied absolute value of the error (ITAE) can be used, as for example in [143] and [144] . Similarly to the discussion on benchmarks, we cannot find any common ground in the remaining papers. There is obviously no consent in the research communities of the different research streams. This could be overcome through common data sets that would facilitate the comparison between different papers. Notable approaches Given the very heterogeneous field it is hard to identify specific trends. However, it is notable that two of the three reviews in the area on non-technical losses and on load frequency control were published very recently. This shows that there is a need for consolidation of what has been accomplished thus far. However, a few individual notable approaches are described in the following.
One of the most notable papers from the research stream of load frequency control is [142] . The authors propose an innovative ANN approach based on a Hopfield network. The authors find superior performance results for the proposed controller. They perform extensive benchmarking against the standard PID controller but also against other possible controller approaches. In this paper, the authors also evaluate their approaches using well-defined performance metrics and the authors test their controller in several case studies. The paper can therefore serve as an indication on what a paper should consider when proposing a frequency controller.
The work by Kang and Lee [149] is very noteworthy due to the data sets used. The authors have information on actual load curtailment of individual participants in demand response (DR) programs for a period of about two years. While many authors consider demand side flexibility, they rarely use field data as most of this research field is focusing on more theoretical solutions due to the fact that such programs are rarely implemented. The authors analyze the response rate of the participants who are contracted for demand curtailment. To do so, they establish a baseline of consumption and then analyze the response towards a signal in regard to that baseline. The observed customers have different contracts that can even change year by year and the authors can therefore not simply observe actual load reductions. However, the research is important as many researchers consider DR aggregators as an important role in future energy markets that are intended to react to volatile renewable generation. Even though the results of their specific application are disappointing, their approach using ensemble classifiers is interesting. The data used in this paper can serve as a benchmark for other researchers and could set a standard for future research on DR and demand side management literature.
In [60] , the authors predict transmission system congestion. This paper is notable for its use of empirical data and for the derivation of business strategies from the analytical model results. The authors aim to forecast whether a certain power plant would be redispatched to reduce transmission grid congestion using different models to benchmark them against each other. They show that these models have different advantages depending on the stakeholder, due to the characteristics of their forecasts: One model has a better precision, while the other model has better recall. These metrics are important for different stakeholders and the authors then move on to describe appropriate strategies based on these different kinds of foresight. Summary Research on Data Analytics in the area of distribution system operators (DSOs) and transmission system operators (TSOs) is very diverse and covers multiple topics. However, this area still holds a lot of potential for further exploration of Data Analytics approaches. The most notable stream is the use of different architectures of neural networks in a control strategy to achieve a balanced grid frequency in the event of disturbances. This research is mostly based on simulations and the authors often show that their controller achieves good stability values. However, even though this is a well-researched topic, there are obviously no clear guidelines and there is no common structure for such approaches. The authors do not use the same problem formulations, but different simulation setups and evaluation metrics. Some do not use any evaluation metrics at all but limit themselves to a visual analysis of their results. This research direction would benefit from more empirical data, common data sets and common evaluation metrics. Where applicable, we recommend quantitative evaluation, using for instance the ITAE, as well as encouraging thorough description of features. Additional potential may lie in extending the common feature set to more external variables such as weather predictions.
Another topic that is frequently researched is the evaluation of nontechnical losses. This is often attributed to energy theft. The authors of these papers mostly try to identify atypical load patterns and then characterize these as theft, although it can be hard to evaluate their success as it is unclear whether or not the found outliers actually constitute theft. Again, this stream of research would benefit from better benchmark data sets and the authors should also compare their results with other studies. Some authors classify faults or congestion in transmission grids, with two studies actually using empirical data for their analysis. However, this area of research is still very diverse and thus generally interesting for further research. Some research focuses on load frequency control, which is often connected to the design of an intelligent controller that activates the LFC when necessary. These studies suggest an evaluation metric in contrast to the controller design papers mentioned at the beginning of this section. Some individual research on different topics such as lifetime optimal operation of batteries exists, but this is more isolated. One of the most notable findings is that the research in this area has very little in common. It might be a useful task to better structure research in the area of DSO/TSO operations. There is no coherence in the areas of data, evaluation metrics or methods both in this subsection overall and even within the papers on the same topic.

Consumption
Within the consumption area, we structure the studies into four applications: Forecasting, Analysis, Clustering and Control. In the following subsections we review the most relevant aspects of Data Analytics research in each application. Forecasting of consumption is by far the most prominent application of Data Analytics in the electricity sector. We categorize consumption forecasting studies regarding their time horizon (short-term versus long-term) and their spatial scope (system-wide versus individual buildings, households, and electric vehicles (EVs)). We define all studies with a forecasting horizon of up to one week as shortterm and all cases with longer horizons as long-term .

Short-term forecasting of system consumption
Short-term forecasting of system-wide consumption is crucial for system reliability tasks such as congestion management, wholesale trading and adequate scheduling and dispatching of power plants. We categorize 31 studies, plus five reviews in this category. Of these, the large majority forecasts the total consumption within a given time interval. Some articles also forecast daily peak values [27,153,154] . Two studies focus on consumption forecasting of special days, i.e. public holidays, consecutive holidays, and days preceding and following holidays [155,156] . Approach overview In general, consumption forecasting of entire systems is often performed using ANNs [8,17,153,154,[157][158][159][160][161] and ARIMA [156,[162][163][164] models. These are often combined with other approaches to hybrid approaches including ANNs [17,85,123,155,[165][166][167][168][169][170][171][172] or ARIMA models [27,170,173,174] . Further observed forecasting approaches are a Fuzzy Inference Model [175] , Multivariate Adaptive Regression Splines (MARS), Holt-Winters exponential smoothing [63] , GAs, Decision Trees [2] , and Hybrid approaches based on Grey Model [176] or SVRs [177] . Data sets Most studies use real-world data from electricity systems from the USA (10), Australia (5), France (4), Great Britain (3) or China (3). The majority of studies uses one data set to evaluate their method.
Exceptions are [63,154,161,169,172,174] which utilize multiple realworld data sets. The length of the used time series differs substantially between studies and ranges from one month [160] , to 17 years [178] of data. Typically, hourly or half-hourly consumption data is used. A few studies utilize very granular consumption data, i.e. one-minute [159] or four-seconds [168] . This enables them to forecast consumption in smaller intervals, i.e. ten and five minutes, respectively. Available data sets are usually split up into training and test set, and sometimes an additional validation set. The sizes of the training set depends on the size of the total data set and typically contains 50-75% of the data. Feature selection All forecasting studies use historical consumption data as an input feature. Multivariate models employ additional external variables as input features. The most common are temperature-related features (14), followed by type of day (9), relative humidity (4), and other weather variables (5). Only few studies pay special attention to model-specific feature selection. The authors of [162] select weather features based on the average standardized regression coefficients. Out of five weather variables only temperature shows a relevant influence on the daily load anomaly and is therefore selected. Recently, AL-Musaylh et al. [177] and AL-Musaylh et al. [178] employed Partial Autocorrelation Functions (PACF) to determine the most significant input features. Benchmark approaches Apart from four studies all articles in our pool compare their proposed approach to one or multiple benchmark approaches. Out of those four, two ( [158,164] ) do not use a benchmark approach, but compare the performance of the proposed approaches to results from literature and expectations from industry. For the other studies, benchmarks range from conventional ANNs and ARIMA-based models to more sophisticated models or real-world industry models. An exceptionally wide range of non-conventional -oftentimes hybrid -approaches is assessed and compared in several recent studies [161,172,174,178] . Evaluation metrics Almost all studies evaluate the performance of their models using the MAPE. Oftentimes, MAPE is complemented with additional metrics, mostly RSME, Standard Deviation, MAE or MSE. The authors of AL-Musaylh et al. [177] and AL-Musaylh et al. [178] present an exceptionally large range of different evaluation metrics, including MAPE, Pearson Product-Moment Correlation coefficient, RMSE, MAE, Willmott's Index, Legates and McCabe Index, Nash-Sutcliffe coefficients, and the relative RMSE. Typically, studies report MAPE values of 0.4% to 3%. For special days -such as holidays -forecast performance tends to be worse with reported MAPE values of 1.02% to 8.7%. Complexity and running speed Not many studies report details on the complexity and computational efficiency of their models, or the computational setup used in their experiment. Notably, the training time of ANN-based methods has decreased from several hours in the early 1990s [153] to minutes in the mid-1990s [165] and seconds in the early 2000s [159] . Hybrid approaches have longer running times than non-hybrid approaches, as they combine several methods. Their potential benefit in accuracy therefore must be weighted against increased computational effort and more complex model-building. Notable approaches First, static and dynamic forecasting models can be differentiated. The authors of Lee et al. [157] present a dynamic ANN forecasting model which forecasts the load of the next 24 hours sequentially using the previous-time forecasts. Notably, the dynamic model achieves higher accuracy than a static approach for one-day forecasts -especially for peak forecasting -and trains faster.
Second, hybrid approaches -whether based on ANNs, SVMs, or ARIMA -outperform their separate component approaches. The earliest hybrid approach paper in our pool is Park et al. [173] . They present a Hybrid ARIMA-based model, which splits up the load forecast into three parts: The nominal load is dealt with by a Kalman filter. The type load for weekend load prediction is addressed with the exponential smoothing method. Last, the residual load is forecasted by an AR model. The authors of Jin et al. [176] introduce a Hybrid Optimization Grey Model (HOGM) based on segmented grey correlation and a multi-strategy con-test. Most recently, the authors of Mason et al. [85] apply an ANN with Covariance Matrix Adaptation Evolutionary Strategy (CMA-ES) to energy forecasting. The authors of Singh and Dwivedi [172] introduce a combination of ANN and a FollowTheLeader algorithm (ANN-FTL). In Muralitharan et al. [171] , the authors assess an ANN-based Genetic Algorithm (NNGA).
Third, studies historically focused on point forecasts. The recent transformation of the electricity sector has motivated the research of probabilistic forecasting approaches. These approaches forecast a distribution of values and thus provide additional information which is especially valuable when consumption is highly volatile. Such approaches include Nonparametric Probability Density Estimation, Bayesian Models, Sparse Heteroscedastic Models, and Quantile Regression [6] . In general, the known evaluation measures from point forecasting can also be applied to probabilistic forecasting. However, specific measures for probabilistic forecasting have also been developed. The two default measures are the Pinball loss function and the Continuous Rank Probability Score (CRPS) [6] .
Fourth, in studies comparing many different approaches over multiple data sets, Partial Least-Squares Regression (PLSR), Nadaraya-Watson Estimator (NWE), GRNN and Double Seasonal Holt-Winters Exponential Smoothing Method perform best. The author of Dudek [161] assesses the performance of multiple ANN based approaches, as well as ARIMA, exponential smoothing, Principal Components Regression, PLSR, NWE, Fuzzy Neighborhood Model, k-means based models, and Artificial Immune Systems. Based on the evaluation of four different data sets from Poland, France, Great Britain, and Australia, GRNN performs better than all ANN benchmarks, but the non-ANN models PLSR and NWE perform even better. GRNN is the simplest and fastest of the models, as it only has to estimate one parameter. The authors of Taylor and McSharry [63] compare ARIMA, periodic AR, an extension for double seasonality of Holt-Winters exponential smoothing, an alternative exponential smoothing formulation, and a method based on the principal component analysis (PCA) of the daily demand profiles. Measured by the achieved MAPE and MAE the double seasonal Holt-Winters exponential smoothing method consistently performs best.
Finally, one notable recent stream of forecasting literature uses disaggregated consumption data to perform forecasts on more granular spatial level and afterwards aggregates results to a system-wide forecast. Such forecasts can, for example, be conducted for individual households [169] or conventional consumption and EV consumption [163] . They are able to outperform aggregated forecasting methods, but rely on more granular input data. Approaches for short-term forecasting of individual consumption are reviewed below. Summary Good consumption forecasting can lead to substantial financial savings [11] and is therefore crucial. In its early days, forecasting research often lacked proper out-of-sample evaluation and rigorous benchmarking [10] . Nowadays this can be regarded standard practice. Simultaneously, the number of available forecasting approaches has strongly increased. Numerous studies demonstrate the superiority of their proposed approach over others on one specific data set. This shows that every application scenario must be addressed specifically and no onefits-all approach exists. For instance, different models might be appropriate for different time horizons [154,177] . When developing a forecast for a certain use case, multiple approaches should therefore be deployed before selecting an appropriate one. It can be considered best practice for authors to compare their new method with various state-of-the-art methods, thus connecting their work to the existing tree of knowledge. More complex approaches -such as dynamic, hybrid, probabilistic and disaggregated models -often achieve better accuracy, but require higher modelling effort and longer training times. Studies that assess various approaches with regard to these criteria across multiple data sets could provide valuable new insights for the forecasting community. The field of peak consumption forecasting is relatively small and can be expected to gain importance in future systems with reduced controllable generation and increasing grid congestion problems. Probabilistic forecasting can provide useful additional information about the distribution of expected values in cases of highly volatile consumption.

Short-term forecasting of individual consumption
In an electricity system with multiple distributed technologies, such as rooftop solar panels, home battery storage, smart meters, and controllable smart home appliances, the need as well as options for forecasting individual consumption are greater. Potential use cases are efficient building operation, as well as optimization [18] and smart storage operation. Compared with system-wide forecasting, the forecasting of individual consumption is a more recent stream of research. We classify 13 studies, plus eleven reviews in this category. All studies forecast the total consumption of households in a given time interval for a short-term horizon. The authors of Fan et al. [179] also forecast daily peak values. Approach overview Most high-impact studies that forecast short-term individual consumption use ANNs [7,8,15,20,169,180,181] or ANNbased hybrid approaches [15,18,19,23,171,179,[182][183][184][185] . In addition, SVR [19,20,23,183,186,187] , SVR-based hybrid approaches [19] , and Bayesian Networks [188] are used. Data sets The data for these studies comes from office buildings [180] , residential buildings [169,183,185,186,188] , commercial buildings [181,184] , public sector buildings [181,182,185,187] , private EVs [189] , and mixed-use buildings [179] . Most studies use one type of data set to evaluate their method. The length of the time series used, varies depending on the study, and ranges from ten days [187] to five years [185] . Time granularity of data is usually between 15 minutes and one hour. Several studies utilize very granular consumption data in the range of one to five minutes [183,185,188] . Notably, the authors of [188] use appliance-level data measured in six-second intervals. Available data sets are usually split up into training and test sets, and sometimes an additional validation set. Typically, training sets contain 60-80% of the data. The largest training set share is used in [181] with 90%. In general, accuracy tends to increase with the training set -for instance, in [188] from 82% at 25%, to 86% at 50%, and 90% at 75%. Compared with system-wide consumption forecasting, training set shares tend to be longer. This might indicate additional difficulty in forecasting individual consumption. Feature selection All forecasting studies that focus on buildings use historical consumption data as an input feature. Multivariate models utilize additional external variables as input features. The most common are temperature-related (6), followed by type of day (4), month or season (4), and solar radiation (3). When using external variables, studies should ensure that only information is used that in reality would be available at the time of forecasting. This aspect poses a notable limitation to Cai et al. [181] , which use actual "future " weather data as an external input, and not the weather forecast. The authors add "white noise " cases for robustness analysis, but this still assumes that weather forecasting errors follow a Gaussian distribution which impacts practical usability. We recommend choosing one of three other methods to integrate weather data in consumption forecasts, in line with [5] : Forecasters could either (a) use historical weather forecasts directly, (b) rearrange the original historical weather data with, e.g. bootstrap methods, or (c) create a mathematical weather forecasting model and use its output as input for the consumption forecast.
The EV forecasting study [189] represents a special case, as it does not use historical consumption data as input, but instead relies on kinematic parameters of trips -such as distance, travel time, and temperature -and of cars -such as acceleration.
In general, a higher number of input features tends to improve forecasting accuracy, but also the risk of overfitting. Some studies pay special attention to feature selection. The authors of Neto and Fiorelli [180] employ Recursive Feature Elimination (RFE) for feature selection. The authors of Li et al. [182] utilize PCA. In Cai et al. [181] and Fan et al. [179] , the authors select external features based on the Pearson Correlation Coefficient and the Coefficient of Determination of feature values and consumption values, respectively. Another promising approach is to utilize variables from similar surrounding buildings via cross correlation, or mutual entropy methods [3] . Benchmark approaches All building forecasting studies benchmark their proposed approach against others. Benchmarks can range from naive baseline persistence models to more advanced physical, statistical and Machine Learning approaches. Notably, some studies compare a wider selection of approaches from different categories [179,183,185] . Evaluation metrics Unlike system-wide forecasting studies, the 13 reviewed studies of individual consumption forecasting do not show a common, default error measure. Instead, a variety of measures can be observed, including MAPE (6), RMSE (5), Coefficient of Variation (5), MAE (4), and a long tail of twelve more measures. One key reason for this variety is that certain conventional error metrics like MAPE become impossible to calculate when values are zero, and very high when values are close to zero -which is likely to occur for individual consumption. The resulting diversity in measures limits the comparability of studies. Therefore, reporting multiple error metrics is advisable. Forecasters should also be aware that MAPE, RMSE, MAE, and MSE are point-wise measures which thus double-penalize models which forecast the shape of the consumption curve well, but get the timing wrong. For applications which have a certain tolerance for mistiming, it can therefore be more appropriate to conduct a restricted permutation of the original forecast and select the one that minimizes the error [3] . Last, probabilistic forecasts demand new evaluation metrics. The pinnball loss function has seen widespread use and provides easy implementation and communication [5] . Complexity and running speed Similarly to system-wide forecasting, only few studies explicitly state the complexity and computational efficiency of their models. In the reported cases, most models can be trained and run in a matter of minutes on standard personal computers. Nevertheless, time for building and training models can vary substantially. For training, times might vary between two seconds and five minutes depending on the approach [179] . Similarly, comprehensive feature selection takes additional time -the authors of Fan et al. [99] report between nine seconds and 50 minutes. Reducing the number of features can decrease training time of the model. In ensemble models, the weighting step takes additional time.
As the computational and modelling effort can be highly significant in real-world use cases, authors are encouraged to report them. Furthermore, authors who wish to demonstrate the usability of their approaches for real-time applications are encouraged to report training times. Notable approaches First, ANN models can perform better when trained separately for working days and non-working days, i.e. weekends and holidays. The ANN models in [180] achieve average errors of 10.8 (working days) and 10.5 (non-working days) compared with 21.0 for a combined model.
Second, hybrid approaches tend to outperform their individual component approaches both for total energy consumption and peak power forecasting [179,185] . On the downside, hybrid approaches demand higher computational and modelling effort. This should be weighted against the gains in accuracy from a hybrid approach, especially when those gains are minor, as reported in Zhang et al. [187] .
Third, current Deep Learning Approaches such as Deep Belief Networks (DBN) can outperform many advanced approaches such as BP-ANN, ELM, and SVR [21,184] . Summary In decentralized electricity systems, forecasting short-term consumption at a distributed level gains importance. This new challenge can be tackled with tailored solutions as the various approaches reviewed in this section show. The selected forecasting time horizon can influence the suitability of approaches [171] . The state-of-the-art in short-term forecasting of individual consumption includes careful feature selection, hybrid approaches and deep learning approaches, all of which come at higher modelling and computation costs than conventional approaches, which must be weighted against accuracy improvements for each use case. In comparison with system-wide consumption forecasting, no studies use ARIMA-based models, training sets tend to be larger, and more attention is accorded to feature selection. When used as benchmarks, ARIMAbased models are outperformed by others. This suggests that ARIMA based models might be less suitable for capturing the higher volatility in individual consumption profiles. The larger training sets and more sophisticated feature selection methods indicate higher requirements of individual consumption forecasting.
In addition, no default error measures exist. We propose using MASE, as it does not rely on division by the actual consumption value and thus is very suitable for individual consumption values which can be close to zero at times, enables comparability across data sets and scales, penalizes positive and negative errors equally, and can be easily interpreted. In cases where large errors in forecasting lead to over-proportionally large losses it is adequate to also report non-linear loss metrics like the root of the average squared error RASE. For applications where the shape of the consumption curve is more important than the timing, we propose conducting a restricted permutation of the original forecast, and selecting the error minimizing forecast.
Most studies use one type of data set to evaluate their method, which limits the generalizability of their findings. We therefore encourage authors to a) assess various approaches, b) apply their model to a reference data set which has been used by other studies in the past, c) report accuracy, computational setup and running time as well as model building effort, and d) calculate and present multiple common error measures for evaluation. This way, future studies can provide valuable new insights for the forecasting community and foster convergence of research in this field.
The field of peak consumption forecasting is relatively small and offers future potential, for example with respect to DR, as electricity tariffs with peak demand and peak capacity charges gain attention [190] . For this and further use cases, probabilistic forecasting can be expected to play a large role as individual consumption exhibits higher volatility and uncertainty than system-wide consumption.

Long-term forecasting of system consumption
Long-term electricity consumption forecasting at a system or region level supports adequate planning of generation and grid expansion, as well as trading on electricity markets. Most studies in our pool focus on electricity consumption. A few papers also forecast other energy sources in addition to electricity, such as petroleum [191] or natural gas [192] or the self-sufficiency rate [114] . Approach overview The most popular approaches for long term forecasting are ARIMA models, grey models, SVRs and ANN in different forms and combinations. If a linear regression benchmark is used, the neural networks outperform this benchmark (e. g. in Azadeh et al. [30] , Kaytez et al. [45] , Ekonomou [193] , Azadeh et al. [194] ). Data sets Most often the data comes from China, Iran, Turkey and the US, although individual papers also examine the electricity consumption in the UK [41] or Taiwan [191] . Feature selection Although the methods are transferable from one data source to another, some external influences vary across countries. For example, the authors of Zeng et al. [195] find that taking the GDP into account is more important when forecasting the energy consumption in China than in the US. Most papers, however, do not compare their methods across data sets. Generally, including weather and socioeconomic factors seems to improve the forecasts [41,43,160] . The authors of Wu et al. [196] find that the total population is the key factor in forecasting the consumption for the Shandong Province in China, while the author of Kavaklioglu [44] also includes features such as imports and exports. In Hamzacebi and Es [197] , the authors include primary energy sources to forecast the yearly electricity consumption for Turkey. The authors of Azadeh et al. [194] include the electricity price for each sector, number of consumers, electricity intensity, value added, consumption, and the price weighted mean of fossil fuels.
Benchmark approaches When a clear benchmark model is used, it is mostly a linear regression. Otherwise, most authors compare their version of the forecasting model with other versions of the same model. For example, the authors of AL-Musaylh et al. [178] introduce a new method called improved complete ensemble empirical mode decomposition with adaptive noise. They compare all models (SVR, PSO) with and without adding the new method. Other authors use a simpler variation of the investigated algorithm, for example in Zeng et al. [195] , the authors compare their adaptive differential evolution backpropagation neural network with a simple neural network. Evaluation metrics The performance criteria are almost always the MAPE and RMSE. Other criteria are also used, for example when deploying an ARIMA model, information criteria such as BIC are reported (e. g. Barak and Sadegh [198] , Ediger and Akar [199] ). Some papers also include ANOVA reports in their analysis [30] . Notable approaches The authors of Kheirkhah et al. [200] present a hybrid model based on ANN for forecasting, PCA for feature selection and Data Envelopment Analysis to compare constructed ANN models as well as ANN learning algorithm performance. The average, minimum, maximum and standard deviation of MAPE of each constructed ANN are used as the DEA inputs. Analysis of variance (ANOVA) is used to determine the best structure in the group that has been identified by DEA. The model is applied to monthly load data from Iran and compared with GA, Fuzzy Regression, ANN, and ANFIS. The proposed model achieves the lowest MAPE of 0.01, compared to 0.14 (GA), 0.082 (FR), 0.156 (ANN), and 0.155 (ANFIS).
The author of Ekonomou [193] compares a linear regression, a neural network and an SVR. They find that the neural network and SVR perform similarly when forecasting the yearly energy consumption of Greece, while both perform better than the linear regression. The grey models are not compared with either a neural network or an SVR. However, hybrid grey models, such as a grey model in combination with ARIMA [201] , or a grey model in combination with genetic programming [202] , improve the accuracy when forecasting the yearly energy consumption of China. Additionally, the natural gas consumption of India is found to be overestimated by the planning commission, when compared with the results of a grey Markov model [192] .
The authors of He et al. [203] , Xiong et al. [204] and Hyndman and Fan [42] examine probabilistic long-term forecasting. While the first paper forecasts the yearly consumption in parts of the US and China, the second paper forecasts intervals for the monthly power on two power lines in the US. Both papers use some form of a neural network in combination with more statistical approaches such as LASSO and Holt-Winters. The last of the three paper uses a two-step methodology to forecast the probability distribution of annual and weakly peak electricity demand for South Australia. In the first step, semi-parametric additive models estimate the effects of external variables such as calendar and weather on the demand. In the next step, the demand distribution is forecasted using simulated temperature, economic scenarios and bootstrapping. Summary A range of approaches exists for forecasting the long-term electricity demand for systems. Overall, there is a need for clear benchmark models to make these approaches more comparable. Additionally, the amount of probabilistic forecasts is rather low and should be investigated in more detail in the future. Careful feature selection can also improve the forecasts and especially social and economic variables play an important role.

Long-term forecasting of individual consumption
Long term consumption forecasts of buildings and individual customers are useful for decision making regarding installation of distributed energy resources, like roof-top solar and battery storage, as well as the development of DR programs. We categorize five papers into this category. Approach overview The retrieved studies use ANNs [64,171,188,205] , SVRs [188,206] , DTs [64] and linear regressions [64] .
Data sets While two papers forecast the consumption of business buildings in Singapore [206] and Hong Kong [64] respectively, two papers utilize the Pecan Street data set from Texas, USA [171,205] or compare the results on a variety of data sets [188,205] . Benchmark approaches The authors of Tso and Yau [64] and Rahman et al. [205] compare different models with the well-established linear regression approach. The other studies compare new approaches only among each other [171,188] , or use a theoretical comparison [206] . Feature selection In Tso and Yau [64] , the authors include a feature selection in the analysis, which finds that summer and winter features differ slightly. While the most important features in summer are the flat size, number of members and air-condition ownership, in winter the housing type also plays a significant role. The other studies do not conduct an explicit feature selection. Evaluation metrics The evaluation metrics comprise MSE (used twice), CV-RMSE, accuracy, precision, recall, Pearson Coefficient, RMSE, and the square root of the mean squared error (RASE) (all used once). Similarly to short-term forecasting of individual consumption, values can be close to zero, which renders conventional error metrics like MAPE unpractical. To enable a comparability of studies, reporting multiple error metrics is useful. Notable approaches The oldest of the analyzed papers Dong et al. [206] is a feasibility study for SVRs. They find that an SVR is able to predict the monthly energy consumption of four office buildings in Singapore. A SVM is used in Singh and Yassine [188] . However, they find that SVM -as well as ANN -is outperformed by their proposed model combining frequent pattern mining, association rules and bayesian networks on appliance-based data sets.
The authors of Rahman et al. [205] apply an RNN for forecasting hourly electricity consumption for a) a commercial building, and b) aggregated residential buildings from the Pecan Street data set. Interestingly, they find that the proposed deep RNN outperforms an MLP for the single commercial building, but not for the aggregated residential buildings. They hypothesize that this is due to the RNN's strength of identifying long-term dependencies, which occur less in aggregated consumption profiles.
On the same data set Muralitharan et al. [171] apply a ANN to predict the daily, monthly and yearly demand of the buildings. They combine an ANN with genetic algorithms (NNGA) and particle swarm optimization (NNPSO) and compare the performance of the models for different time horizons. Interestingly, the NNGA is best suited for short-term forecasting, while the NNPSO is superior in the long-run. Summary With only a few papers looking at long-term forecasts for buildings, there still seems to be some potential for further analysis. However, for planning purposes, where the need for long-term forecasts is high, the building level seems to be unimportant. Most likely, the longterm building level consumption only changes when there is a change in inhabitants and thus a short-term forecast is the most useful.

Consumption analysis
Consumption analysis is an exceptionally broad field of application with a variety of use cases. Studies can be categorized regarding customer type -household, commercial, industry -time horizon, and spatial scope of analysis Zhou and Yang [4] . Most of the retrieved studies focus on the household level. Approach overview Observed approaches include (i) a combination of SOM and k-means (for clustering), with decision trees (DTs) and rule set for classification [36] , (ii) conditional demand analysis [207] , (iii) adaptive k-means algorithm with feature extraction and final hierarchical clustering depending on segmentation criteria [48] , (iv) ANFIS [208] , (v) finite mixture model-based clustering [39] , (vi) unsupervised data clustering plus frequent pattern mining analysis [188] , and (vii) association rule mining [209] . Data sets The majority of studies analyses data from households [36,39,48,188,207,209] . The authors of Singh and Yassine [188] utilize data on appliance level. In Sefeedpari et al. [208] , the author analyses the consumption data from 50 dairy farms. Time resolution ranges from six seconds [188] to one hour [48] . The length of the time series ranges from six months [36] to two years [188] , but the majority of studies uses data of one calendar year. Benchmark approaches Only two reviewed studies compare their approach to a benchmark. The authors of Aydinalp-Koksal and Ugursal [207] compare their CDA to both an ANN and an engineering model (ENG). In Sefeedpari et al. [208] , the authors compare their AN-FIS approach to a Linear Regression. Two other studies employ crossvalidation [36] or bootstrapping [39] for robustness checks. Feature selection All studies use historical real-world consumption data as input feature. Other input features are seasonal scores, weekend/weekday scores, dwelling characteristics, socio-demographic factors, attitudes towards energy, characteristics of households' appliances, and usage of other energy carriers -such as diesel, gasoline, kerosene, and natural gas. Evaluation metrics The set of observed evaluation metrics is as broad as the set of approaches and contains mean index adequacy, accuracy, R 2 , CV, RMSE, MAPE, classification uncertainty, entropy, standard deviation, and sum of squared errors (SSE). Notable approaches A highly relevant field of analysis is the classification and segmentation of customers. Classifying and segmenting residential, commercial and industrial customers enables appropriate electricity tariffs, (DR) programs and energy efficiency programs to be marketed effectively. The authors of Figueiredo et al. [36] apply a combination of an SOM, and k-means for clustering, and subsequently classify customers with a (DT) and a rule Set. The final classification accuracy is 81% for working days and 74% for weekends. In Kwac et al. [48] , the authors segment residential customers with a three-step model. First, they create a dictionary for representative load shapes by modeling the distribution of load shapes and clustering them with an adaptive k-means algorithm. Second, they extract proper dynamic features from the encoded data utilizing the pre-processed dictionary. Third, they perform a hierarchical clustering depending on segmentation criteria such as a lifestyle or usage variability. Entropy is used for capturing customer variability. The results show that it is possible to use customers' load shape profiles to calculate their level of use and entropy. The authors of Wang et al. [209] present an association rule mining algorithm. Their results suggest that socio-demographic factors such as employment status, and number of occupants have strong significant associations with typical electricity consumption patterns (TECPs). In addition, the results indicate that attitude-related factors have almost no effect on TECPs. Last, households with more than one person are more likely to change their TECP across seasons. Summary Consumption analysis can be applied to a variety of use cases. This is represented by the range of studies reviewed in this category. The most prominent sub-field of consumption analysis is the classification and segmentation of customers. Benchmarks are less common than in other areas. To improve comparability and convergence, future studies are encouraged to adopt rigorous benchmarking. In this regard, researchers can learn from best practices in Short-term Forecasting of System Consumption and Short-term Forecasting of Individual Consumption for instance.

Benchmark approaches
The most exhaustive clustering analysis of customer load data is performed in Chicco [13] . The author compares 15 different methods and evaluates them with nine different metrics. One of the central results is that the best performance is achieved by clustering methods which can effectively isolate outliers. Feature selection The most commonly used features are those which can be derived from the existing data such as the relative average power in each period [39] or the number of morning peaks [31] . Additionally, weather information [210] and calendar information [39] seem to improve the clustering process, as does any other information about consumers, such as decade a building was built in [16] . Evaluation metrics Typical evaluation measures for clustering are the connectivity index, Dunn index [31] , silhouette index [31,188] , entropy [39] , and cophenetic correlation coefficient [210] . Using multiple evaluation measures helps to avoid local optima. Notable approaches Grouping the customers before forecasting seems to improve the forecasting accuracy [169] , where the accuracy depends on the clustering quality and stability. This stability can be evaluated using bootstrapping Haben et al. [39] . However, there appears to be a trade-off between the prediction accuracy on each cluster and the cluster stability [16] .
The authors of Arias and Bae [210] conduct an agglomerative hierarchical clustering of traffic patterns in order to forecast EV electricity consumption. A grey relational analysis is subsequently performed to identify factors influencing traffic volume. Last, (DT) is applied to connect influencing variables (input) to clusters of traffic patterns (output).
In Biscarri et al. [31] , the authors cluster consumption data from 281 customers in Spain. Consumption profiles are clustered with a variety of different algorithms, i.e. hierarchical, k-means, Diana, PAM, Fanny, Clara, SOM, SOTA, Model-based Clustering. No algorithm can identify the optimum for all categories of customers. It can thus be concluded that, similar to forecasting, the most appropriate clustering approach depends on the given data sample. Summary Clustering techniques are difficult to compare and depend to a large extent on the use case at hand. However, there is a general tradeoff between cluster stability and prediction accuracy. Additionally, any information about the data to be clustered, such as weather at the same location or meta-information about the building at hand, improves the clustering process.

Consumption control
Building a control model itself is not so much a Data Analytics task, rather than a modelling task -and thus outside the scope of this review. However, control models still rely on reliable Data Analytics approaches, e.g. in order to determine when to take an (automated) action. Since control tasks require long-term reliability, this represents a challenging task, as external factors change over time, reducing the performance of previously appropriate Data Analytics approaches. Approach overview There is one study in this category, which employs an ANN [211] . Data sets This category contains one study that uses 15-minute data from a tertiary building in Italy. 75% of data is used for training, 15% for validation and 15% for testing. Notably, the study finds that a collecting period of only about two months for the training data set is sufficient for their use case, as it allows to create a reliable hourly energy model with a MAPE of 9.53% on the test set. Feature selection Input features include consumption data, calendar variables, external temperature, illuminance, relative humidity, and the number of people inside the building. Benchmark approaches Two methods are compared for model retraining -Mobile Training and Growing Training. Evaluation metrics The MAPE is used for evaluation. Notable approaches The study presents an approach for training an ANN, continuously checking its test performance and automatically retraining it if necessary. After the ANN is trained, its accuracy is evaluated every week. If the previously identified threshold is passed, retraining is initiated. The two approaches for retraining are found to perform similarly well, with mobile training (6.77%) slightly outperforming growing training (6.49%). Summary Good Data Analytics-based models are the key precondition for efficient consumption control [1] . One key challenge is the adequate selection of input features, mainly thermal property variables, climatic variables, and occupancy variables. Furthermore, in the light of increasing electrification of industry processes, efficiency goals, and the spread of time-varying tariffs and peak demand charges, automated, Data Analytics-driven consumption control can be expected to gain further relevance in the future.

System
Two studies and one literature review simultaneously analyze more than one area of the electricity value chain. Approach overview Both studies use ANN-based approaches. The authors of Tiwari et al. [212] apply a conventional ANN. In Xiao et al. [213] , the authors introduce a hybrid forecasting model based on Singular Spectrum Analysis (SSA), combined with the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method for fast local conversion and a modified Wavelet Neural Network (WNN) with Improved Cuckoo Search Algorithm (CS) for optimizing the initial weights and the parameters of dilation and translation in the WNN (SSA-BFGS-CS-WNN). Data sets The authors of Xiao et al. [213] use half-hourly load and price data from Australia, and 10-minute wind speed data from China. Training is performed on 12 days (i.e. 92% of data), testing on 1 day. In Tiwari et al. [212] , the authors uses data from a special setup comprised of two areas, each including two 900 MVA machines and a 187 MVAr capacitor, which are connected by a 220 KV double circuit line. Feature selection Historical consumption values are used as input features in Xiao et al. [213] . Benchmark approaches The approach proposed in [213] is compared with several benchmarks, namely BPNN, a genetic algorithm-optimized BPNN (GABPNN), radical basis function neural network (RBFNN), WNN, and cuckoo search-optimized WNN. The authors of Tiwari et al. [212] compare their approach to a 'conventional controller'. Evaluation metrics MAPE, MSE, MAE and computation time are reported and discussed in Xiao et al. [213] . Notable approaches In Tiwari et al. [212] , the authors investigate how an ANN can be used to control a Unified Power Flow Controller (UPFC) to improve the transient stability performance of a power system. The study introduces a two zone system with different levels of power flowing from Zone 1 to Zone 2 through a 220 KV double circuit line with a length of 220 km. One transmission line is subjected to a short circuit fault for a duration of 200 ms. The proposed method damps the interzone and local modes of oscillation in the system very effectively in all the cases under consideration, as compared with the conventional controller. It performs satisfactorily even at those operating points where the regular PI controller fails to stabilize the system.
The authors of Xiao et al. [213] apply a hybrid approach to forecast short-term electricity generation, load and price. The proposed SSA-BFGS-CS-WNN outperforms all benchmarks in all three areas. It results in MAPE reductions of 46% (load), 32% (wind speed) and 26% (price) compared with the next best approach. In addition, for load forecasting the SSA-BFGS-CS-WNN calculation speed (18.34-20.61) is much faster than the second-best CS-WNN (29.34-31.64), but slower than other benchmarks with a worse performance.
In Liu et al. [12] , the authors present a review of approaches for isolated electricity systems, such as islands, which acknowledges the importance of consumption forecasting when modelling such systems. Missing historical data and the high influence of consumption behavior can pose an extraordinary challenge in these environments, necessitating specifically-tailored approaches. Methods based on ANN, ARIMAX, SARIMA, and SOM have been successfully applied. Another important aspect is precise generation forecasting of renewable resources. Apart from numerical models, studies have researched ANN, ARIMA, ARIMAX, ANFIS for renewable generation forecasting. Summary Only a small number of studies covers system-related applications. In particular, ANN-and ARMA-based models have shown reliable performance in various system-related forecasting tasks. With a growing number of microgrids, local electricity markets, and multi-energy systems, this area contains abundant potential for future research, as isolated systems have special characteristics that create unique challenges.

Conclusion
In this review, we aim to provide a structured analysis of high-impact research related to Data Analytics in the electricity sector. Because of the uniquely broad scope of our review, we apply a hybrid search method, including manual and automated steps for selecting relevant literature. This allows us to identify and review the main streams of research in this field.

Key findings and applicable knowledge
We first provide a high-level overview of the research landscape. We discover that the number of related articles is growing rapidly, outpacing other fields of research. A large share of high-impact studies come from Asian universities. Regarding international collaboration, China and the USA are at the center of the collaborative network in the field.
Next, we present an in-depth review. For this purpose, we classify retrieved studies along the three dimensions area, application , and approach . State-of-the-art approaches that can be seen across multiple areas include (hybrid) Machine Learning approaches based on ANNs, SVMs, SVRs, and DTs. Our findings indicate that no one-size-fits-all approach exists. The most appropriate approach highly depends on the context. Guiding questions that future researchers can ask themselves to find a suitable approach are: What kind of data is available? Is time series data used? How volatile is it? How important is the interpretability of results? Can a hybrid combination of different approaches help to tackle different characteristics of the problem? What is acceptable computational effort and modelling complexity? Considering these aspects enables an appropriate Data Analytics approach to be tailored to each use case. The area subsections in this article provide orientation for researchers and practitioners to identify promising state-of-the-art approaches for their respective use case and to determine possible future research directions. We provide a summary of overall best practices and area-specific findings below.

Best practices
We observe a number of aspects that future Data Analytics research in the electricity sector should incorporate in order to move the field forward. Input data should always be thoroughly described and ideally, benchmark data sets should be developed and used. Programming languages, software packages and data sources should be named and if possible, data should be made available to enable other researchers to reproduce results. This goes hand in hand with the global movement of open source, open data and open science.
Moreover, newly proposed approaches are to be compared to multiple state-of-the-art benchmarks. Evaluation of multiple error metrics, model building effort, run time, and computational setup improves the comparability of studies. Showcasing the performance of an approach on more than one data set demonstrates its potential generalizability and helps in avoiding over-fitting.
In the spirit of transferring insights, it can be beneficial for researchers in a certain area to apply approaches and best practices to their problem that have proven valuable in other areas. For this, the in-depth area sections in this review can offer ideas and inspiration. We summarize the main area-specific findings below.

Area summaries
In the Generation area, we see that most publications focus on forecasting the generation of either wind or solar energy. As these renewable energy sources depend on the weather conditions, advances in forecasting methods contribute to the overall system stability and enable network as well as power plant operators to lower their costs. In this area, we see an example of the push factor impact like increasing computing power. This enables the development of more complex models. The growing number of advanced neural networks approaches (e.g., ANFIS) and the combination of different approaches underline this argument. Random forest and regression tree approaches are an upcoming trend and we suggest that researchers explore the capabilities of these approaches regarding generation data. Pull factors in future generation research will likely be a more detailed differentiation within the solar and wind groups. The generation from off-shore wind power plants is steadily increasing in many countries. This offers potential for future research, as the focus of current research is on-shore wind. Furthermore, rising renewable generation capacities will increase the need for prediction of generation curtailment by network operators in some systems and their impact on energy markets. Besides, the dependencies between distributed generation and consumption in distribution grids will become crucial for distribution network operators. Improved forecasting results of both sides (consumption and generation) will allow operators to predict critical system states (e.g. congestion) and enable them to counter or prevent these. However, they also require spatially granular data.
The most prominent literature in the Trading area focuses on the forecasting of electricity prices, especially in the short term. An accurate forecast impacts all market participants, which leads to the large number of studies in the area. However, there are still some gaps in the understanding of the factors that contribute to the occurrence of extreme prices. In order to better deal with the distinct volatility of electricity prices, thorough selection of external features can be expected to gain importance. Furthermore, the recent introduction of renewable energy and smart grids have led to higher uncertainty of future long-term electricity prices. It is important to understand the limitations of the traditional point forecasts in this respect and focus stronger on probabilistic forecasting, also in the short and medium term.
The review of the Data Analytics related literature in the area of electricity Transmission and Distribution shows that the field has yet to be consolidated. Currently, researchers borrow the methodology they need for certain isolated problems but they do not necessarily adhere to standards in the Data Analytics community. That means that there are no benchmark data sets, no common evaluation metrics and the analyses are often based on simulated data. The area is also very diverse in terms of research interests. One focus research stream is certainly automated control but other streams are also considered by multiple authors, namely non-technical losses or failure prediction. Additionally, a lot of individual research streams can be identified in our analyzed pool. However, most research is motivated by push factors such as the increased availability of computing power and more data (even though most data in this area is still simulated). Pull factors only play a minor role. In the future, the community should begin a dialogue with transmission and distribution system operators to find what they need to improve their operations. Possible topics include the anticipation of deviations from scheduled power generation or consumption, the automated identification of optimal switching patterns to avoid congestion or the use of dynamic line rating dependent on the anticipated weather conditions. All of these would improve the operation of power grids and decrease costs. These topics have been addressed by individual publications but the review shows that these are still rather isolated researchers and that the community should put a stronger focus on the TSO and DSO operation improvement.
Within the Consumption area studies focus on four applications: Forecasting, Analysis, Clustering and Control. Forecasting, has recently seen a strong increase in research interest and represents the largest field of application in this area. Short-term probabilistic and peak consumption forecasting can be expected to gain importance in future systems with increasing grid congestion problems, higher loads from EVs at the distribution grid level, and more opportunities for DR. Similarly, in decentralized electricity systems, short-term forecasting of consumption at a distributed level gains importance. The volatility of individual consumption poses a unique challenge. Consequently, we can observe that successful individual consumption forecasting typically requires especially thorough feature selection and larger shares of the data set for training. On both levels, state-of-the-art hybrid approaches and deep learning approaches usually outperform ARIMA models.
Long-term forecasting is challenging, especially as it highly depends on socio-economic and economic factors which for a proper forecast need to be forecasted as well. Probabilistic forecasts are rare in this area and thus leave some potential for the future. Overall, simpler models such as ARIMA and linear regressions perform well but can be outperformed by well specified and trained neural network approaches.
The most prominent sub-field of consumption analysis is the classification and segmentation of customers. To improve comparability and convergence, future studies are encouraged to adopt rigorous benchmarking. In this regard, researchers can learn from best practices in, e.g. Consumption Forecasting research.
For Clustering, the choice of the most suitable approach highly depends on the use case at hand. There is a general trade-off between cluster stability and prediction accuracy. The availability of metainformation and additional external features are shown to substantially simplify and improve the clustering process.
Furthermore, in the light of increasing electrification of industry processes, efficiency goals, and the spread of time-varying tariffs and peak demand charges, automated, Data Analytics-driven consumption control can be expected to gain further relevance in the future. Data Analytics is also a key pre-step for efficient consumption control. One key challenge is the adequate selection of input features. Here, researchers can adopt recent advances from other fields, e.g. short-term forecasting of individual consumption.
System related Data Analytics covers multiple areas in an integrated fashion. Only a small number of the retrieved high impact studies covers such applications. Especially ANN-and ARMA-based models have shown good performance for various system-related forecasting tasks. Given the rise of microgrids, local electricity markets, and multi-energy systems, we predict a rising need and potential for integrated System Data Analytics applications across multiple areas, as we outline below. This is a chance for researchers from different backgrounds to work together to find innovative solutions for the future energy system.

Outlook
In the near future, several major trends will affect the electricity sector and influence the development of Data Analytics in this context. Integration of Electricity Value Chain Areas First, Data Analytics will need to acknowledge the increasing interconnectedness of areas along the electricity value chain. Larger shares of varying Generation from renewables have an increasing impact on Transmission and Distribution capacity and wholesale Trading . Under time-varying electricity tariffs, these effects will be passed on to short-term end-consumer prices. Those prices, in combination with higher flexibility from EVs, stationary storage and automated DR will in turn impact Consumption . Hence, approaches that are applicable to multiple areas and integrated analyses can provide additional value.

Integration of Energy Sectors
In the future, the electricity sector will be increasingly coupled with other energy sectors like mobility, heat, and gas leading to multi-energy systems. Data Analytics can be expected to play a crucial role in the transition and integration of these sectors. For instance, heat pumps, air conditions and EVs will have a consid-erable effect on the forecasts of individual household's consumption. Therefore, future work could expand the present review to other energy sectors and the aspect of sector coupling.

Decentralization of Generation and Consumption
In the upcoming years, new applications for Data Analytics at the interface to consumers may move further to the center of research attention. If adoption of rooftop solar panels, residential batteries, EVs, electric heat pumps, smart home appliances and smart metering infrastructure continues to rise, both the availability of data and the need for new solutions will grow substantially. For example, granular smart meter data can be used for forecasts on building level with probabilistic approaches to capture the specific time dependencies of individual consumption. If a household has solar generation, consumption forecasts can be expanded to prosumption forecasts. Such forecasts enable new solutions for future decentral challenges like grid congestion, optimized charging of batteries and EVs and the design and recommendation of spatially and timevarying electricity tariffs. As Data Analytics applications move closer to individuals, they have to increasingly consider consumer anonymity [214] and human behavior [4] . An outline of the potential future interplay between Data Analytics research and Behavioral Energy Economics research is given in Staudt et al. [215] . Democratization of Data Analytics Finally, we observe that the key resources needed to conduct Data Analytics research become more available to a broader community. Looking at the programming languages that the studies in our qualitative analysis pool report, Open Source Programming Languages like Python and R are becoming increasingly popular (see Fig. 7 ). Moreover, some institutions and researchers make their data sets accessible and legally usable as so-called Open Data . Papers in our review, e.g. have used freely available data from Open Power System Data [216] and ENTSO-E [217] on generation, prices, and consumption. For further related data sources and a discussion of the legal aspects of sharing and using electricity system data we refer to Hirth [218] . Besides data, publishing newly developed models including the associated code increases reproducibility ( Open Methodology ). Last, progress of Data Analytics is fostered by Open Educational Resources like online courses as well as freely available cloud computing resources [219] which render possible increasingly complex models.
Given these major push and pull trends, the importance of Data Analytics and Artificial Intelligence in the energy sector is set to grow further. With the analysis provided in this work, we hope to help researchers in finding inspiration for new ideas and to facilitate successful future research.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.