Electrical Load Forecasting Models for Different Generation Modalities: A Review

The intelligent management of power in electrical utilities depends on the high significance of load forecasting models. Since the industries are digitalized, power generation is supported by a variety of resources. Therefore, the forecasting accuracy of different models varies. The power utilities with different generation modalities (DGM) experience complexities and a noticeable amount of error in predicting future electrical consumption. To effectively manage the power flow with negligible power interruptions, a utility must utilize the forecasting tools to predict the future electricity demand with minimum error. Since the current literature supports individual and limited power sources involved in generation for load forecasting, thus the utilities with multiple power sources or DGM remain unexplored. Therefore, exploration of existing literature is required relating to analyzing the existing models which could be considered in load forecasting for DGM. This paper explores state-of-art methods recently utilized for electrical load forecasting highlighting the common practices, recent advances, and exposure of areas available for improvement. The review investigates the methods, parameters, and respective sectors considered for load forecasting. It performs in-depth analysis and discusses the strengths, weaknesses, and error percentages of models. It also highlights the peculiarities of methods used in residential, commercial, industrial, grid, and off-grid sectors aiming to help the researchers to appraise the common practices. Moreover, trends and research gaps are also discussed.


I. INTRODUCTION
The uninterrupted and stable supply of power is the fundamental requirement of an electrical power generation system. Such systems generate the necessity of meeting the load demand; by analyzing the history, behavior, trend, and factors affecting the load and predicting the future demand of load, termed as load forecasting (L.F). It is considered crucial to determine the supply and demand gaps of electrical power generation. It provides benefits to the management to understand the different variables influencing energy consumption, augmentation, and waste. Also, it is equally crucial for power control and exchange in different interconnected power systems since it plays a fundamental part in overlaying the disparity and concerns for energy demand [1].
The recent developments in electrical system by integrating several power resources (including renewables) for generation of electricity has made load forecasting more complex. This significant integration of renewable energy (solar, wind, thermal, etc.) has advanced the load forecasting phenomenon towards a more challenging phase. It has created tough environment for power professionals to interpret the electrical generation, supply, reserve, and manage the power demand [2]. The occurrence of this complexity is due to non-linearity and non-stationary load behavior; therefore, it involves detailed analysis and consideration of several factors affecting the forecasting procedure, directly or indirectly. Alongside the benefits of forecasting techniques, challenges of prediction accuracy still exist.
Due to rapid change in electrical load consumption patterns, it is important for forecasting models to learn and interpret the TABLE 1. List of related literature reviews covering different aspects of load forecasting with different review types and targets.

Reference
Year Review Type Review Target Remarks [7] 2016 Tutorial Probabilistic approaches in load forecasting The study presents the tutorial review on probabilistic load forecasting. It discusses the forecasting techniques, methodologies, and provided a research direction in probabilistic load forecasting. [8] 2017 Systematic Identification of scenariobased model for load forecasting The study presents a systematic review of load forecasting methods. It focused on identifying the particular scenarios for models suited for their application. It also presents a taxonomy for model selection in accordance with their problem. [9] 2017 Comparative Architecture of RNNs for real-valued time series The study performed a comparative study on STLF using different RNN networks and presented their important architectures and properties.
[10] 2019 Methodological Computational intelligence based STLF The study provides a framework for different computational intelligence methodologies considering several case studies as an example. It also presents an overview of difficulties experienced in STLF with possible strategies to tackle them. [11] 2020 Comprehensive Single and hybrid methods based on ML The study covered the single, hybrid, and combined methods of load forecasting in STLF. However, in hybrid models prime target was SVM and ANN. It also presents the advantages, disadvantages, and functions of these models.
Proposed 2021 Narrative ML approaches used for STLF for conventional and different generation modalities The study presents a combined review differentiating in residential, commercial, industrial, grid and off-grid sectors. It highlights the limitations of existing load forecasting methods and discuss possible strategies to overcome these issues. It presents the possibilities to tailor existing load forecasting model in accordance with the requirement of different generation modalities. Also, it highlights the possible research gaps to work as future-work recommendations.
Several forecasting techniques used in LTLF include multiple linear regression (MLR) to observe the relationship of load and external variables using the hourly informative approach to improve predictive modeling, perform analysis on different scenarios, and normalize the weather data [13]. Specific methods have relied upon sub-load profiling through individual or cluster aggregation, which helps short-term load forecasting but is unreliable for LTLF due to external parameters driving load variations. A study performed on Australia's electricity market discussed the changes in population, technology, weather and economic conditions, and individual electricity consumption. The load relationship with factors affecting load forecast was analyzed using semi-parametric models, also bootstrapping with the variable block method was utilized. The study predicted the annual and weekly demand peak for South Australia [14].
Reference [15] employs decomposition and bootstrap aggregation methods and auto-regressive moving average (ARIMA) for monthly electricity demand in the univariate forecast. Reference [16] has employed particle swarm optimization (PSO) for minimizing the error of the associated parameter while forecasting annual peak load utilizing the data from Egypt and Kuwait networks. Reference [17] forecasted the electricity consumption of China per capita utilizing an expert prediction and fuzzy Bayesian model to resolve the functional LTLF problems.

B. MEDIUM-TERM LOAD FORECASTING
MTLF deals in time horizons ranging from a month to a year, which usually covers the planning for the maintenance of the grid, consideration of prices for electricity and organization for energy sharing, and fuel scheduling. Several models have been employed for MTLF tasks. Reference [18] considered to eliminate irrelevant features from pre-processing data model comprising of feature selection, based on informative knowledge using Jaya (optimization algorithm) along with CRBM (conditional restricted Boltzmann machine) to forecast month electrical load for smart homes. Reference [19] presents a comprehensive survey regarding different employed techniques at Nigerian electric utility, and three regression techniques, linear, compound growth, and quadratic, were employed independently, helping the Ibadan electricity distribution company to make efficient energy management planning and management steps.
Reference [20] and [21] discuss some more models for MTLF with different time range utilizing time-dependent convolutional neural network (TD-CNN) and cycle-based long short-term memory (C-LSTM) for MTLF improvement and variants of models of machine learning predicting the district level forecast employing ANN, linear regression (LR) and adaptive boosting model classified into monthly, seasonal, and yearly forecast, respectively.

C. SHORT-TERM LOAD FORECASTING
STLF which deals in time zone ranging from a few minutes, hours, or days is a significant factor of day-to-day operation and planning of a power utility and critical component of an energy management system. The STLF approach is efficient in reducing financial costs and operational risks, impacting savings directly. Therefore, it is given much prominence and treated as a critical problem in the competitive energy market. Several STLF techniques have been proposed, some of which have been classified as statistical methods, including LR, ARIMA, exponential smoothing (ES) [22]. Later in STLF, scientists introduced specific new methods, including ANN, FL, SVM, recurrent neural network (RNN), and LSTM [23].
One of the major driving factors of a power utility is its operation and planning which depends on STLF. Since STLF ranges from minutes to days. Therefore, it is considered to be a crucial factor in decision making of a power utility for load control, dispatch, and secure operations of a power system. The digitalization of power utilities has commended towards multiple power resources. Thus, the power utilities dealing in different generation modalities need to have an efficient and adaptable model for load forecasting. Since DGM involves multiple power resources, therefore STLF plays a critical role to direct the operations for shutting down or starting up the resources. Additionally, it will help the management to efficiently manage the distribution of power among DGM. Section V discusses various STLF methods utilized in their respective applications in detail.

D. VERY SHORT-TERM LOAD FORECASTING
VSTLF is famous for dealing load forecasts in a time frame ranging from minutes to an hour. As the VSTLF deals in near future forecasting, certain load forecasting factors consideration is conditional, but mostly time and temperature are considered, with current values exploration to forecast future values. However, the forecasting in VSTLF is primarily dependent on the recent load pattern compared to the rest (STLF, MTLF, and LTLF), which depends on multivariable relationships. Some of the methods that include VSTLF include classical-statistical and computational intelligence techniques, out of which some were exploited for forecasting in district buildings [24].

III. DIFFERENT GENERATION MODALITIES
The accumulative generation of electrical energy through a combination of two or more power resources from conventional and renewable energy sources is termed as different generation modalities. In the DGM environment, the generation of an individual source is managed through load demand forecast produced in collective efforts of these modalities. They do have individual proportions to contribute and keep enough reserve to support in any event of equipment failure. The conventional energy sources include coal, nuclear, hydro, oil, and gas. However, renewable energy sources include thermal, wind, or solar energy, etc. The DGM has multiple applications such as smart grid (SG), power utilities, and independent power producers (IPPs). The DGM could be the combination of any two or more of the above-mentioned power resources for energy generation (For e.g., coal, gas, solar, and wind).
Since the amount of energy consumed by a person has increased, therefore it has enormously increased the energy demand affecting the environment along with the problem of energy generation and shortage [25]. Such kinds of issues highlighted the importance of DGMs in SG, which has the capability of managing the energy more efficiently [26]. The inclusion of multiple power resources strengthens the power system sustainability, improves system demand, and helps in cost reduction [27]. In conjunction with benefits, it also raises challenges for power professionals, as it increases the system complexity. Energy consumption does get affected by several factors, similarly, generation is also affected [28]. This increases the complexities in the correlational analysis [29], as multiple factors get involved due to DGM.
In electrical load forecasting for conventional and hybrid renewable generations, the models having capabilities to deal with non-linear and multivariate data have outperformed the competition [11]. Most of these models are deep learning models that include ANN, DNN, CNN, and LSTM [5], [10], [11], [29], [37]. The types of modalities forecasted in recent papers along with suitable algorithms have been listed in Table  7. However, there exist different hybridizations of these methods along with different optimization algorithms (discussed in section IV) to improve the prediction accuracy. But these models still need modifications, expansion, and alteration to coop with existing complexities such as adaptivity, the introduction of new parameters, sudden sag or lag, stress on a single generative system, etc. Still, with such an improved performance of these algorithms in the existing state, they are not suitable for the environment like DGMs. Since DGMs have multiple sources and each source has its unique set of parameters to deal with. Therefore, an individual contribution forecast is better to anticipate the generation requirement at this stage in comparison to a combined forecast.
In hybrid power plants, the load demand is anticipated based on previously accumulated load patterns (patterns from a conventional and renewable source) with single point accumulated forecast with relevant parameters, which variate the generation. This variation of power generation from sources is one of the major causes to trigger the distress in electrical systems.
Thus, one of the power sources experiences high volumes of load and encounters the surges and put risk to meet the demand [30]. This risk is usually confronted by conventional power sources since renewables have a large range of parameters that affect their power generation.
However, in DGMs the contribution of each power source is forecasted individually based on its previous patterns and relevant parameters which variate the amount of generation. This encompasses the sole contribution of the source and later accumulated in the total forecast, but each forecast is on an individual basis. This leads to a multipoint forecast with a final accumulated forecast. Due to this equity of treatment of power sources, the other power resources are not distressed. In case of power variation from any of DGMs, the requirement is equally distributed on other power sources to effectively manage the demand.
Since the electrical load forecast is crucial for operations and planning therefore if the management is unable to foresee the deviations of contributing resources precisely and accurately than it impacts the energy management [31]. The modern power systems face several challenges including peak demand which is another cause for exhibiting the non-linear, noisy, and distinct pattern existence in load anticipation. This peak demand could be a result of meteorological variables, huge consumer loads, large industrial operations, and an inclined acceptance of electric vehicles (EVs) [32]. These peak demands are more likely to be satisfied through conventional means or calculated DES. Table. 2 provides some insight differences on the hybrid power plants and DGMs in the light of certain standards to clarify the concept and brief its vitality in the smart grid environment with respect to the DGM's importance.
A facility's power plant is one of the most crucial departments that facilitates the rest departments to ensure their operations are timely. For this purpose, a felicitous approach of anticipating the future requirement should be computed to prevent any downtime at an economical cost. The load forecast stands as its main deriving factor, which plays a vital role in regulating how and when to expand the power facility. Fig. 2 gives a sample flowchart of the electrical load forecasting model in the DGM environment. Every aspect has its benefits, affecting factors, and some confrontations which should be addressed, accordingly. This section describes these aspects in detail, which could lead to the improvement of conventional and DGM load forecasting models. The limitations of existing models with respect to the DGM environment are presented in section VI.

A. IMPORTANCE OF LOAD FORECASTING
Beside the importance of anticipating the future load demand, load forecasting helps in numerous other sectors as well. It directly helps the organizations relating to power for their required calculations and non-power sector to plan their operations in accordance. It also plays an important role to lay foundation of SG and smart city.
The benefits of load forecasting in dealing with operations, planning, and investment, including market bidding, controllable loads scheduling, generation, fuel purchases, load-switching, and infrastructure maintenance schedule, have  highlighted its importance among utilities to contemplate its effective harvesting. Table 3 lists some benefits relating to conventional and DGM load forecasting with the effects on their beneficiaries. Table 3. Benefits of electrical load forecasting in conventional and DGMs environment.

S. No
List of Benefits

Remarks on Benefits of Load Forecasting Government Environment
Power Utility Industrial Commercial Residential

Load demand anticipation
It guides the beneficiaries in decisions, anticipating load demand for the utilities to perform future demand planning considering their generation and transmission accordingly, at economical costs by a valid forecast.

2.
Operations and planning ✓ -✓ ✓ --It guides the beneficiaries to conduct their operations and planning in accordance with the accurate predictions to avoid the hassles. It helps to decide about managing the modalities to maintain economical operations.

3.
Infrastructure and maintenance ✓ -✓ ✓ --It guides the beneficiaries to understand the increasing demand and perform related actions timely to avoid the downtime. Also, it helps to recognize the improvement in infrastructure and highlight the scheduling of maintenance required for smooth operations.

Type
The hybrid power plants are a combination of generation or storage resources put together (solar and wind) or single conventional power source with single or double renewable energy sources.
The power plants with DGMs have multiple conventional and multiple renewable energy sources.

Load Balancing
The load is usually balanced between single conventional and single or double renewable power resources.
The load is usually balanced between multiple conventional and multiple renewable power resources.

Bad Weather
In bad weather scenarios, the conventional power resource is distressed with more load.
In such scenarios the load is balanced between two or more conventional power resources.

Worst Weather
In such scenarios additional power resource is required or load shedding is performed to keep the system running with maximum output.
In such scenarios no additional power resource is required since the load is balanced with multiple conventional power resources. Dependency High dependency on renewable power resource. Low dependency on single power resource.

Risk
Moderate risk of spikes and load surges on electrical system. Low risk of load surges on electrical system.

Forecast
Single point forecasting Multiple point forecasting

Error
There exists moderate error rate due to single point forecast. Due to multiple point forecasting the error rate is minimum.

Back up
The backup is supported by either batteries or single conventional source. The backup is supported by multiple power sources.

Source Failure
In such scenario, one of the systems will have great stress. Also, it could cause a cascaded disaster.
In such scenario, the system will not be stressed. Thus, no cascading impact.

Smart Grid
Such utilities face complexities to manage two-way power flow and smart power management.
It effectively manages the two-way power flow, making it a fundamental part of SG's.

Smart Cities
Such utilities cannot be part of smart cities due to their complex behavior in dealing with power failure.
Such utilities are considered essential for implementing smart cities due to their multi-source power generation having minimal impact on consumers in power failure situations.

4.
Purchase and saving of fuel --✓ ✓ --An accurate forecast can help to pre-bargain and prepare the warehouse in accordance with the required demand of electricity which eventually results in savings in terms of anticipating the requirements earlier.

5.
Resource and reserve calculation ✓ -✓ ✓ --Earlier demand anticipation helps the utilities to manage the spinning and non-spinning reserve of the system for sudden demand requirement.

6.
Over and Under energy preparation ✓ -✓ ✓ --It projects the future demands which helps to effectively manage the uncertainties and production facilities.
It leads the utilities to observe the trend of consumption to generate a guideline to enhance the generation facility or include a new resource to bridge the forthcoming gap.

Uninterrupted power supply
An accurate forecast can help to intelligently manage the power supply guiding the utility to perform smart operations to avoid power surges, spikes, sags, dips, and outages.

9.
Loss, waste, and theft calculation ✓ -✓ ---It provides an understanding to disclose the issues causing system to overload.

10.
Renewable energy contribution ✓ ✓ ✓ ✓ ✓ ✓ It enables the effective integration of renewable energy after understanding the load behaviors in different time intervals.

IPP's contribution
It enables the beneficiaries and independent power producers to understand their generation segment to avoid cascading collapse affect.

12.
Industrial Digitalization ✓ ✓ ✓ ✓ ✓ -It empowers the industries to smartly manage their operations based on intelligent forecasting.

13.
Smart grid implementation ✓ ✓ ✓ ✓ ✓ ✓ It helps the utilities to understand the consumption pattern and act accordingly to design the power flow to benefit and smartly manage the distributed energy resources. 14.
Smart city implementation It is considered a founding pillar in implementing the smart city by foreseeing the exponentially increased demand and mapping it with relevant factors to efficiently manage the sources and intelligently produce the power.

Investment and Market bidding
It enlightens the beneficiaries to manage the investment and bidding smartly with preprocessed information provided by forecasting.

Reduction in CO2 emission
It enables to integrate renewables in densely to utilize different energy resources at different intervals of time contributing towards reduction in CO2 emission.

Preservation of fossils
It provides the consumption pattern revealing the demand peak and stability hours which enables to utilize respective energy resources which could help to recover the depletion rate of fossils.

B. IMPACT OF INFLUENTIAL FACTORS
The electrical load has several direct and indirect relationships with certain factors which affect load forecasting. These factors directly contribute towards achieving the prediction's precision and accuracy, if considered wisely while developing the model. Researchers need to explore the most pertinent factors for their model and develop a relevant scenario about the electrical consumption pattern, due to the nature of dependency of electrical load and the forecast model on these factors. A correlation of time and temperature is presented in Fig. 3 and Fig. 4 using the dataset from GEFCOM 2014. Some of the significant factors having impact on load forecasting models in conventional and DGMs approaches are explained briefly as follows: 1. Time: The relationship of load forecasting with time is beneficial for pattern recognition and depends on the hour to an hour (less or more) period, for which it should be considered during model development. After performing several simulations referring to every hour of the day, the hourly load consumption changes were observed, varied responses for every day of the week [30]. Reference [31] has extensively described the load behavior that varies from hour to days to weeks to weekends and months. It states that the data interpretation depends on past hours following their day and time, contributing to an accurate model. The time frame of activities performed also plays an essential role in determining the future load. A load to time correlation is presented in Fig. 3, which depicts the load changes concerning the time domain.
2. Meteorology: One of the most dominant load forecasting factors consists of meteorology, which consists of multiple variables. A correlational data interpretation of load and temperature is presented in Fig. 4, presenting the changes observed in load consumption concerning the temperature increment or decrement. Temperature, humidity, wind speed, rain, cloud cover, and solar radiation influence electricity consumption. The load consumption could be less, equal, or more significant in the summer and winter seasons depending on the meteorological factors of different regions explained in part 3 (region). However, the increase or decrease in temperature from normal levels causes the increment in load consumption as air-conditioning increases in summer, whereas heating increases in winter. A moderate temperature level at any of these seasons results in less electricity consumption, thus causes the over or underestimation of load forecast, challenging the forecasters to find an adaptive model for these deviations.
Researchers have proposed different variables out of the above mentioned to reach the maximum accuracy, but still lack to achieve the desired results. Reference [32] discusses several studies that include the humidex index, wind chill index, and different weather uncertainties showing the importance of including the meteorological inputs for achieving better forecasts. 3. Region: The world comprises of very different climates having remarkable variations, which differ from region to region. The regional consumption of electricity differs as different regions have distinct meteorological constants, affecting their consumption level. Therefore, to achieve a higher degree of forecasting accuracy, regional climatic variables need to be incorporated in the model development. Every region has its weather and climate; some countries cover all four seasons, some have two to three, and some exist with only one climate with minor changes. Due to this issue, the load demand varies, particularly the load demand of tropical countries does not match with the rest. A model designed only for tropical countries would not be valid for other countries until it consists of regional adaptability. Therefore, the region also retains importance as an influential forecasting factor. Fig. 5 highlights the different countries have a different climate, which keeps variating their energy consumption level; the color level indicates the max, medium, or moderate level of temperature in the region [33]. Reference [34] discussed the approach of weather priority indexing and explained the effects of different regions within Iran holding different weather categories throughout the year with sixteen (16) different regional electrical companies (RECs). 4. Application: The application factor is quite interesting as it does affect the load forecasting if the designed model was for any other application. If designed for the small grid, it will not be efficient for large scale consumers.  Similarly, if it is designed for a grid isolated network, its efficiency for grid-connected would differ. Therefore, the application also plays a role in model development [7]. 5. Economic: Electricity has become a basic necessity of life, resulting in an economic impact on its usage. The economic factor does include industrialization, load management, electricity pricing, and consumer behavior. The price of electricity directly relates to buying capability and is directly proportional to the defined tariffs with peak and non-peak hours. The appliances also play their roles in economic factors as developed and modernized countries have more advanced power consumption devices than under-developed countries; their load is also different. Thus, the country's economy plays a role in long-term load forecasting as it advances industrial development [35]. 6. Events: Since November 2019, the pandemic situations are going to be counted as one of the influential factors in meeting the demand and supply as such situations put load increment in the residential sector, decrement in commercial and load variations concerning country policies can be observed in industrial sectors, as some completely and some partially closed their operations. On the other hand, the week data differ with weekends and other events or particular holiday data. Reference [31] explains the importance of these events for consideration and discusses their significance in detail by developing daily load profiles. 7. Historical information: The historical data for training of models is of much importance since it decides the features to be adapted by the model as it sets the trend of model. Hence, the data should be considered to have a variety in the learning process so that the model can learn at full capabilities to perform the predictions [36]. 8. Data quality: The quality of data plays an important role in electrical load forecasting. If the data has errors, it will result in bad forecast. Ultimately, it also affects the performance of the model. Therefore, the data quality should be maintained either while using the data as input for model or before collecting. However smart meters do provide sufficient data to use directly [37]. 9. Technology: The technological changes have advanced towards more sustainable power systems, but it has also contributed to errors of load forecasting as the increased renewables penetration and electric vehicles (EV's) have added more uncertainties and complexities towards load prediction [37]. 10. Distributed power resources: The contribution of electrical power by commercial and residential sector which is on the other side of the meter increases the errors in load forecasting for which meter level load forecasting should be considered [38].

C. CONFRONTATIONS
To modernize the conventional grid and transform it into a SG, the researchers and scientists have worked in many electrical engineering subjects and developed many state-of-art methods in load forecasting to improve electrical load forecasting efficiency make an economic impact in electrical generation, transmission, and distribution sector. Several methods have been developed on which scientists have kept improvisations under consideration so that the model accuracy and precision could be enhanced, and an adaptable model could be established to deal with different generation modalities. During these improvement and enhancement periods, several confrontations were observed and held responsible for achieving the accurate model. Out of many confrontations, some are listed as below: 1. Lack of consideration of influential meteorological factors, causing an abrupt error in load forecasting. 2. Lack of adaptable model considering regional adaptability of an electrical load forecasting model; causing models to behave inaccurately in load forecasting and generating large errors, including economic impact in different regions. 3. Consumer behavior analysis, including digital data collection of consumers. 4. Power surge, maintenance, loss, waste, and flawed data analysis, along with data validation. 5. Extreme penetration of distributed energy resources. 6. Model limitations to withstand the requirements of DGM. 7. Grid isolated power plants load forecasting.
The above mentioned are some of the significantly observed confrontations that inspired the researchers to continuously improve and develop new models focusing on creative challenges in data acquisition, data clustering, parameter defining, segregation of days based on their data consumption structure, special events, and holidays.
The current pandemic situation has also shuffled all models' bases, as it has converted the traditional working model into work from the home model, which has created a significant impact on power industries as the official load has been distributed to homes, hospitals, and quarantine centers due to high consumption demand at residential, medical and hospitality consumers. Simultaneously, the demand for the rest of the industries has fallen from the calculated point. An adaptable model would outperform all the rest models by transforming the current online data into live feed streaming.

IV. AN OVERVIEW OF CLUSTERING AND OPTIMIZATION TECHNIQUES
The clustering and optimization techniques have widely been used to gain information from data and optimize the parameters to have adequate inputs. Therefore, we have summarized some of the clustering and optimized techniques with their models, features, and specifications used in electrical load forecasting to improve the prediction results and minimize the error expectancy efficiently.

A. CLUSTERING TECHNIQUES
The clustering techniques have been widely used in electrical load forecasting to sort and properly arrange the raw data in the proper format. It is used to perform analysis and remove the unwanted data for forecasting (a sample of data clustering procedure is also highlighted in Fig. 6, which describes the typical working of a data clustering algorithm.), as the issue of clustering is highlighted in several different areas like mining of data, recognition of patterns, statistical analysis of data etc.
The clustering (supervised or unsupervised) also serves as a tool to analyze the data insights and gain data to detect their characteristics, distribution, outliers, and noise. Data clustering requirements deal with scalability, different nature of data, cluster and attribute discovery, high dimensionality, data with noise, and provide an interpretable, coherent, and functional data format.
Clustering is forming data groups, recognized as clusters, having a similarity between the same cluster elements and high dissimilarity between elements of different clusters [39]. There exist numerous clustering techniques, out of which some are presented in Fig. 8.
The distance-based methods comprise two algorithms: partitioning, in which datasets single clustering is created, and hierarchical, in which dataset's nested clustering sequence is created. They are further breakdown into k-means, k-median and k-medoids, and agglomerative and divisive methods, respectively. Also, some density-based methods which cluster the objects' dense region and highlight the noise include density-based spatial clustering of applications with noise (DBSCAN).
The recently improved DBSCAN by hierarchical clustering converting it into HDBSCAN has also received immense acceptance; DENCLUE and OPTICS are also some of the extended versions of density-based clustering [39]- [40]. Multi-resolution grid data structure with a sub-category of STING, Wave-Cluster, and CLIQUE; the expectationmaximization algorithm for underlying data-points estimation probability and subspace clustering for dimensionality reduction is part of grid-based, probabilistic-generative, and high dimensional clustering methods, respectively.

B. OPTIMIZATION ALGORITHMS
In recent times the optimization algorithms have been widely used, mostly inspired by nature as nature is still unexplored, yet the best mathematical calculations are performed by nature. Some of the nature-inspired optimization algorithms explored by scientists and researchers are shown in Fig. 7, which includes the artificial bee colony (ABC) algorithm based on the foraging behavior of honeybees and how their colonies are structured functions. The bees are divided into groups, namely, employed (food source exploitation), onlooker (exploration decisions), and scout (exploring new food sources, when needed) structures in their social organization.
Similarly, the ant colony optimization (ACO) is based on ants' real foraging behavior (development of the shortest possible path from source to nest). The ant communicates about food sources through pheromone trails left by the ants taking food back to the nest, making it observable by others to travel on provided trails towards food sources [41].  Moreover, the genetic algorithm (GA) belongs to the evolutionary algorithms in inspiration to the theory of Charles Darwin's natural evolution. GA comprises crossover, mutation, and selection operators for diversity and particle swarm optimization (PSO) was an inspiration work observed by birds' flocks and schooling fish searching best locations collectively for optimum fitness [42].
Furthermore, the grasshopper optimization algorithm (GOA) is based on teeming grasshopper's behavior working in teams of nymph (without wings) exploiting the neighborhood and adults (with wings) responsible for locating better regions of food sources [43]. Also, the cuckoo search algorithm (CSA) is one of the many natures inspired optimization algorithms to solve different engineering problems based on family parasitism. All these and other optimization algorithms work iteratively until reaching satisfactory or optimum solutions.

V. LITERATURE REVIEW ON STLF
There are various databases with numerous papers in electrical load forecasting; however, we have deduced some literature from Science Direct, Web of Science, and Scopus databases. These papers comprise of majorly past 7 years, taking the most contribution from journal papers in residential, commercial, power plant/ industrial, grid, and off-grid sectors. Various contribution from different databases is also shown in Fig. 9. Load forecasting has become the energy sector's spinal column in any decision-making process as the complete operations are enormously affected by it. The electrical load pattern and its behavior are intrinsically transient. Therefore, affected by several parameters causing model to vary efficacy.
As discussed in section II C, that power sector organization holds STLF accountable to generation sources, power economics, start-up and shutdown forecast, hourly operations, management and scheduling for cost savings, security assessment of power systems, and vulnerability occurrence analysis. Therefore, STLF has been in exploration (highlighted in Fig. 10) with different methods that changed, upgraded, modified, or newly developed from time to time. Consequently, this section presents an in-depth analysis of different sectors considered for exploration of STLF.

A. RESIDENTIAL STUDIES
Several studies on residential electrical consumers have been carried out due to their intrinsic power consumption nature. The electrical consumption pattern in the residential sector differs from studies carried out in different countries or cities. Some of the studies have been discussed as below: Barman Mayur et al. [36] carried out four different case studies investigating the impact of seasonal changes on residential load consumption; the study proposed incorporating seasonality effects for STLF in Assam, India. It utilized the Firefly Algorithm (FA), SVM, and season-specific similarity concept (SSSC) under different meteorological variables exploration after plotting the correlations.
Season-specific and traditional approaches were compared in two different parts: considering the varying variables of input factors and the approach. However, it was determined that the inclusion of multiple meteorological inputs had a certain degree of positive impact in improving the STLF. A similar study by Barman Mayur et al. [30] presented the regional approach's consideration and included the regional factors affecting STLF to improve the load forecasting model's efficacy while comparing the study with GA-SVM and PSO-SVM.
Weicong Kong et al. [37] discussed the issues on STLF for residential customers by performing the data analytics on individual households and system load. A density-based clustering is also utilized to evaluate the load data concerning consumption and errors. LSTM has been employed for the prediction task for its pattern recognition ability which outperforms other machine learning algorithms. Forecasting topology is based on individual basis load forecasting from the provided data set, which is later aggregated together due to Australia's SG's fashionable society. The smart society data enabled the gathering of residential consumer behavior towards consumption analysis, which is further sub-sectioned under different data interpretation categories. The observations included customers' different consumption consistency, creating a challenge for forecasting accuracy; also, the forecasting accuracy tends to drop significantly as the aggregation level decreases. However, the LSTM can contribute more towards the improvement of forecasting accuracy irrespective of the consumption inconsistency.
Weicong Kong et al. [38] presented the residential load forecasting that depends on the resident behavior analysis. It learns via appliance consumption with a data frequency of every 30 minutes interpreting the consumer to load analysis for effective STLF in residential forecasting through LSTM and compared with feed-forward neural network (FNN). The test scenarios were differentiated concerning time intervals and appliances selected, claiming to show a significant improvement in STLF through load consumption by appliances.
Y. Wang et al. [44], to deal with (variability and uncertainty) futuristic load profiles, presented a probabilistic approach for individual consumer load forecasting. To achieve the task, the LSTM is supported by pinball loss which guides LSTM for parameter training. It presents a multioutput LSTM model to forecast all quantiles in one network instead of multiple individuals on the Ireland dataset (comprising of 6000 residential profiles out of which selection of 100 random profiles were selected) with different lead times (1/2, 1, 2, and 4 hours). The model was compared with the quantile regression neural network (QRNN) and quantile gradient boosting regression tree (QGBRT). The dataset was exclusive of weather data, which raises a significant concern if the weather variables change.
Xinling Wang and Sung-Hoon Ahn [45] presented a framework for residential electrical load anomaly detection (RELAD) by utilizing a rule-engine-based load-anomaly detector (RE-AD) along with one step ahead load predictor (OSA-LP) to detect a load of about 44 residential customers in a Tanzanian rural area. OSA-LP is a combinational framework of ARIMA and ANN, employing Bayesian information criterion (BIC) for data fitting problems and optimization of model. Whereas RE-AD is a combination of SVM and kNN, the forecasting is performed by utilizing the past 24 hours.
F. Amara et al. [46] examines the relationship between temperature and electricity in Montreal, a city of Quebec  province, Canada, where electronic devices are used as a primary source of heating in residences, increasing the electric consumption and adding complexity in forecasting as duration and number of houses varies. This study considered temperature as the major influential factor and utilized an adaptive conditional density estimation (ACDE) model and compared it with multi-layer perceptron (MLP) and recursive least squares (RLS) to forecast the heating and airconditioning consumption and data analysis of thermal behavior of the building.
S. Bruno et al. [47] on an R&D project of Italy, performed on public buildings which targeted to design an energy management system (EMS) device responsible for optimization and control for distributed energy sources. The work majorly contributes towards developing control modules for EMS applications, consisting of flexibility in adopting different forecasting techniques (ES, ARIMA, and NN) and relevant impact variables. Model predictive control is used in a hierarchical structure for limiting the error in the forecast.

B. COMMERCIAL STUDIES
This section provides a detailed review for state-of-art methods for the commercial sector, including buildings, offices, institutes, etc. considering their indoor, outdoor, design, layout, social and economic factors influencing the electrical load forecasting: Weiwu Ma et al. [48] discussed distributed energy systems (DES's) while targeting the district load forecasting (DLF) and presented different facets affecting the district load. Classification and comparison of methods (top-down and bottom-up) were performed to observe the current lack of forecasting, claiming to forecast accuracy or workload burden on the system. The study also discusses the indoor, design, layout, social and economic factors that have an underlying impact on load forecasting associated with buildings.
Bishnu Nepal et al. [49] proposed a method comprising of k-means clustering and auto-ARIMA for electrical load peak forecasting for buildings taking the data of Chubu University (East Campus) from 2017-2018, to take respective actions for designing the strategy that helps in peak reduction of load and curtail the electricity bills.
Yongbao Chen et al. [50] have focused on the demand response (DR), the difference of baseline and actual load, and being inspired by support vector regression (SVR) and its capabilities, have presented an approach to improve the load forecasting for office buildings utilizing the historical load and weather data also stating that the dry bulb temperature which is taken "pre-two hours" can contribute more in terms of accuracy in forecasting. The working dataset was based on four buildings' load data for 2 months for a specific time (9 AM to 5 PM), including the working factor schedule.
Min Duan et al. [51] presented a model comprising feature selection for selecting the best inputs and features. The proposed model was constructed using SVM with finely tuned parameters to forecast for an aggregated load of the selected buildings to consider electric vehicles' impacts.
Hanane Dagdougui et al. [24] have examined their proposed approach of ANN with Bayesian regularization (BR) and Levenberg-Marquardt (LM) for STLF for district buildings in Montreal (downtown campus). The study evaluates the ANN performance and its learning algorithm of backpropagation, analyzing model performance for an hour and day ahead forecasting for buildings of different types and investigating internal architecture impacts on models forecasting ability and different input features, weather conditions, and previous load data. The study claims that better performance was observed for one-hour ahead load forecasting.
Yang Liu et al. [52] outlined a new prediction technique for building level consumers load forecasting using a sliding window empirical mode decomposition (SWEMD) for maximizing the relevancy and minimizing the redundancy depending on Pearson's correlation (MRMRPC). The forecasting engine is an improved Elman neural network (IENN) and novel shark smell optimization (NSSO) algorithm.
Elliot Skomski et al. [53] explores the implementation of sequence-to-sequence RNN for STLF considering the case study of four commercial buildings; comprising offices located in eastern Washington. The study also deduced that a long forecast window increases the probability of low predictions. The study investigates the performance of neural networks with varying time resolutions (1, 15, or 60 mint data frequency), the date range's dependency for data training, decay effect, and short-term context. The sensitivity of sequence-to-sequence models for hyperparameters and the generalized effect of one model application to others (one building model for others) were also part of the performed investigation.
Hamid Chitsaz et al. [54] explored a new prediction model for an educational building supplied by a micro-grid, comprising a self-recurrent wavelet neural network (SRWNN) for its training and implementation. LM algorithm is used to forecast the volatile and disrupt time series of the electrical load of British Columbia Institute of Technology (BCIT), Vancouver, which has very high volatility (load fluctuates severely on a day-to-day basis compared to normal grids).
Zulfiqaar A.K et al. [55] have developed a hybrid method combining CNN and LSTM-AE (autoencoder) for energy forecasting in residential and commercial buildings (UCI France and Korean commercial building, respectively) utilizing the data obtained from smart meters and claiming to be the first to incorporate the framework with utility preprocessing. The model works in a manner where CNN performs feature extraction from data input after preprocessing and normalizing the data (removing the outliers and missing values but collected data was from smart meters. Therefore, no such values were found and transferred for encoded sequence generation to one of the LSTM modules and later decoded by another LSTM module advancing to predict energy.

C. INDUSTRIAL STUDIES
The worldwide electricity consumption is enormously contributed by the industrial sector, which either has its generation through power plants or manages the supplied power through the national or contractual grid and personal generation. Industrial consumption varies and increases with time as the region, state, or country progress in industrialization for economic growth. Therefore, this sector needs explicit exploration as a separate field in terms of shortterm electrical load forecasting.
The power plant of a facility is one of the most crucial and essential departments, which facilitates the rest of the departments to ensure their operations are timely achieved; representing industrial power plants is presented in Fig. 11. For this purpose, a felicitous approach of anticipating the future requirement should be computed to prevent any downtime at an economical cost. The load forecast stands as its main deriving factor, which plays a vital role in regulating how and when to expand the power facility.
The STLF problem can be observed as a non-linear forecasting problem. Therefore, it is highly necessitated for power plants dealing with the industrial or commercial sector to consider the influential parameters affecting the forecast, such as process, meteorological factors, and its associates, day, time, and event. Models not considering these constraints will generate low predictions resulting in the loss of millions of dollars in operations, maintenance, and generation process. Some of the studies focusing on industries are discussed in this section to highlight the sector's importance.
Antonio Bracale et al. [56] provided the industrial load modeling insights considering a set of MLR models while observing transformer manufacturing Italian factory and characterizing the production and working shifts utilizing qualitative variables for a valid industrial load forecast. The load measurement was performed effectively by taking the total, each feeder, and single loads of machine, buildings, etc. The study analyzed the importance of considering the inclusion and exclusion of industrial load information for efficient modeling. Also, Antonio Bracale et al. [57] presents the active and reactive power forecasting for a probabilistic industrial load forecast investigated by univariate methods (quantile regression forests -QRF's) focusing on suitable variables and multivariate (vector autoregressive exogenous model -VARX) methods focusing on multivariate approach; by exploring the same factory dataset; characteristically a heterogeneous load.
D. H. Kim et al. [58] performed a study on two small scale industries from Silicon Valley (CA, USA) for industrial load forecast focusing on peak-load prediction for more accurate results using the different combination of regression models (bagging, random forest, extra trees, ada boost, and gradient boosting regressors). The data collection frequency was continued for 2 years with the time interval of 5-minutes. The study contributes towards understanding and depicting the peak-load with features selection capability and evaluation of underestimation. The model was designed as a generalized process and had added compensation values in the predicted results.
Antonio Bracale et al. [59] present an approach of probabilistic industrial load forecasting which includes the active power and reactive power of an Italian factory. The method comprises two stages out of which first consists of individual univariate forecast generation for active and reactive power, and second amalgamates these forecasts into a multivariate based regression model (comprised of QRF or URF).
Yong-Feng Zhang and Hsiao-Dong Chiang [60] have presented a framework of two-layered neural network for STLF, namely Enhanced ELITE (based on optimal structure, accurate and diverse ensemble of neural networks) and consensus-based mixed-integer PSO trust-tech (CMPSOATT) for high-quality solution optimization, computation, and accuracy of STLF for industrial applications. The E-ELITE model constructs several individuals and internal NN models and then selects the best out of it based on accuracy. The model's network architecture comprises 3 stages: exploration and exploitation of data in the bottom layer and ensemble in the top layer.
Sungwoo Park et al. [61] investigates the combined cooling, heat, and power (CCHP) in Korea (the energy markets in Korea are working towards SG transformation) to curtail energy costs in the process of power generation utilizing useful and accurate STLF. It proposes a two-stage model in which executions first stage involves models of extreme gradient boosting (XGB) and random forest (RF), and the second stage works on deep neural networks (DNN's) for calculating optimal operation schedule, electrical charge minimization, and electric rate. The performance measurement of this model was observed by conducting Wilcoxon and Friedman tests.
Yusha hu et al. [62] have presented a case study in which a detailed discussion of STLF for the process industry (papermaking process) has been briefed. The study presents the model of "GA-PSO-BPNN" in which GA-PSO has been considered for parameter optimization (weights and thresholds) of BPNN (an MFFN backpropagation trained, for error optimization of ANN). The study considers the data from two different process industries to verify the proposed model and generate a comparison. The Chinese process industrial enterprises are accounted for around 70% of electricity consumption in comparison to society. The study is subdivided into four sections of obtaining the historical data, pre-processing it by outlier's removal using 3 sigma method and gap filling of results utilizing interpolation methods and noise filtration utilizing moving average (MA) and KKF methods: training and forecasting, and evaluation. The input variables affecting the load forecasts were considered from external and internal industrial environments.
H. -g. Son et al. [63] present demand forecasting technique for the industrial sector using time-series clustering of different methods combined (double seasonal Holt-Winters-DSHW, Trigonometric transform, Box cos transform, ARMA errors, Trend, and seasonal components -TBATS, ARIMA Variants, and NN-AR) by utilizing the data obtained through advanced metering infrastructure (AMI) with the interval of an hour. It presents the power prediction method using the bottom-up approach after analyzing electricity consumption clusters (based on autocorrelation and normalized periodogram distances).
Y. Wang et al. [64] carried out an experimental study on a company of Hunan (a province in China, where industries have to purchase the load consumption based on precalculations) which presented an ensemble Hidden Markov model (e-HMM) for STLF to understand the characteristics and learn the dynamics of the different pattern formed during power consumption of an industrial consumer concerning time considering the date, meteorological and electrical features obtained by GRA for redundancy and efficiency of the model. The "log-likelihood" strategy is designed with time windows to improvise HMM results from past generated patterns to improve the accuracy and prediction results. The bagging ensemble algorithm's framework was also utilized to minimize the prediction errors in the single model. The study mainly contributes towards the combined approach of data mining and prediction and multiple HMM's integrated frameworks. The model comparison was made with state-of-art methods of SVR, RF, CNN, and LSTM.
A. O. Hoori et al. [65] utilized the ISO New England input features and hourly load data as a dataset to test and train the model validation of Multicolumn radial basis neural network (MCRN) and Radial base function neural network (RBFN), and further investigate the renewables high penetration. The study parted the input-features of large datasets into multiple sub-datasets and trained them in individual RBFN in a similar manner, reducing the computation interval. The study also provided insights regarding the reliability of renewables distributed generation potentials for considering the best features that affect the load forecasting for either shortage or surplus of energy in planning.
A. Ahmed et al. [66] claims to present the model with great accuracy and fast convergence (AFC) rate of STLF for industrial applications but cannot forecast for two or more days. The study also discusses the improvement in STLF models' accuracy and convergence on existing ANN-based methods and modifications in two techniques: "mutual infobased feature selection and enhanced differential evaluation (EDE) algorithm." The proposed AFC-STLF model comprises a feature selector which removes the redundant and irrelevant inputs and transfers the data to the next module termed as a forecaster and made up of ANN where training and validation are performed; later, the data is feed into an optimizer module where the error is calculated and optimized to a minimum using mEDE iterative algorithm. P. Kou and F. Gao [67] focused on probabilistic day-ahead load forecasting for energy-intensive enterprises (EIE's) considering a model based on Gaussian process and named it as a "sparse heteroscedastic" model using the data generated at the power plant of steel manufacturing in China, collected through SCADA system for an averaged frequency of 5 minutes interval for 5 months period. The autocorrelation was plotted to observe the nature of load consumption. HGP encounters high-computational complexity, which makes the case complex for practical implications. Therefore, l1/2 is used to overcome the issue.
S. Ungureanu et al. [68] aims for the machine learning model's efficiency concerning industrial load forecasting. The dataset utilized is of meat processing facility (capacity of almost 35 tons meat/day) comprised of hourly consumption values. The study implemented RF and LSTM algorithms utilizing python-based programming language utilizing TensorFlow and Keras libraries for machine learning. The study also investigated the electricity dependencies, including meat production and storage warehouse, which is directly linked to the temperature; sales are also essential factors.
M. Tan et al. [69] proposed the ensemble deep learning model for load forecasting and controlling the ultra-short-term power demand. The study contributes towards the accuracy and robustness of forecasting by the hybrid strategy of models and model ability enhancement for peak demand forecast. The proposed model takes input from demand and external data to conduct an exploratory data analysis (EDA) on raw data, later normalizing it and treating several LSTM networks groups (LSTM consideration is made because of its strong processing capability for time-series) off-line with subsets and subspace of data. The model performance was assessed with an open dataset from the Australian energy market operator (AEMO) with the same data testing and training. However, long-time forecasting for peak demand needs to be improved.
D. -M. Petrosanu [70] provides a month ahead hourly forecasting of electrical consumption for medium industrial consumers utilizing the customized NARX, ANN, and LSTM on a timestamped dataset of hourly values consisting of 8760 records. The data was collected from the smart meters database; therefore, no missing or abnormal values were observed, making the forecast more reliable. The method claims to harness the benefits from custom-designed combinational models based on NARX, LSTM, and ANN, which generates the daily forecast at the initial stage through NARX-ANN and then refined with LSTM-ANN for achieving better precision and accuracy, also parallel computing architecture was utilized.

D. GRID STUDIES
This section provides insight into the proposed electrical load forecasting methods focusing on the electrical grids considering different conditions, regions, and applications of different types. The exemplary smart power grid architecture with a two-way flow of information is shown in Fig. 12. The review for grid studies is provided as below: Ahmed I. Saleh et al. [71] have presented a load forecasting strategy using data mining techniques and feature-selection and outlier-rejection methodologies. The prominent feature selection has enabled genetic-based feature selector and rough set-based feature selector (GBFS and RBFS). The obtained results are treated with kNN and NB classifier for load predictions.
Neethu Mohan et al. [72] presented the work relating to developing the STLF model based on data-driven strategy and employing the method famous for its information extraction, namely, dynamic mode decomposition (DMD), a methodology for the underlying system that captures their spatial-temporal-dynamics. The model claims to be efficient in identifying the load data's characteristics and the factors affecting it (time, calendar, meteorological, social, and economic) and improvising the forecasting accuracy and complexity in grid regions. The test observations were performed on two energy market operators that belong to Australia and North America and compared with other stateof-art methods, and the results were satisfactorily acceptable compared to others.
Xiangyu Kong et al. [73] have presented a method developed based on error correction methodology that predicts the error in the load data utilizing DMD, corrected in data selection and forecasting error. GRA is employed to input the model in the data selection stage, which performs required data collection (previous day data, same day data in the previous week, and similar day data). DMD's load data is further refined for stable predictions involving the extreme value constraint method (EVCM). However, the prediction accuracy differs concerning areas (lower accuracy for small areas and vice versa).
Nian Liu et al. [74] have presented a hybrid forecasting model for micro-grids developed with Empirical mode decomposition (EMD), which decomposes the load data into different scale trends and scales components, extended KF and extreme machine learning with the kernel (KELM) algorithms which create a forecast model of dual prediction, and PSO for parameter optimization. Four micro-grids having different load characteristics were considered for method verification, out of which the load of two belonged to small residential areas and two to commercial buildings.
Ying Nie et al. [75] have proposed a hybrid model (RBF-GRNN-ELM) with combinational data pre-processing. The datasets from Queensland and Victoria (Australia) are considered for the study. Complimentary ensemble EMD is used for data decomposition. In contrast, singular spectrum analysis (SSA) is used for information extraction and forecasting. The combinational model of RBF-GRNN-ELM and further improvisation of results are performed with a multi-objective grey wolf optimizer (MOGWO) algorithm. After data pre-processing, the standalone forecasting is performed, and later an aggregated model is constructed.
Yeming Dai and Pei Zhao [76], considering the dataset from a SG from Singapore; have provided some improvements for SVM and have made a proposition of a hybrid model consisting of intelligence of feature selection and parameter optimization method (second-order oscillation and repulsion PSO) for improvement of forecasting accuracy. The study considers pricing as one of the major influential factors for consumption pattern variations. Therefore, it is considered one of the inputs with other redundancy and relevancy features and holidays projection; a weighted grey relation is employed.
A. Ghasemi et al. [77] proposes a price and demand forecast method using useful tools in pre-processing, forecasting, and algorithm tuning. The method is divided into specific parts, which first involves signal decomposition into multiples at different intervals utilizing flexible wavelet packet transform (FWPT) and feature selection from input data employing conditional mutual information (CMI). The second stage behaves like a multiple inputs and multiple-output (MIMO) model comprised of a Non-Linear Square SVM (NLS-SVM) and ARIMA for correlational analysis. The last stage involves a time-varying ABC algorithm that tunes the parameters.
Youlong Yang et al. [78] forecasted short-term power load using the sequential-grid-approach (SGA) based SVR, which subsamples the parameter and fine-tune regions it. The study contributes towards an efficient SVR framework for accuracy improvement of STLF, improving SVR calculation intervals, introducing sequential inference for the selection process, and validating the proposed framework over two real cases from Jiangxi (province of china) and California electric utility.
Feifei He et al. [79] proposed a day-ahead QRF load forecasting model based on decomposition strategy. The study decomposes the load data into sub-models with variational mode decomposition (VMD), the influential factors WTHI (i.e., weighted temperature and humidity index), and type of day are taken into consideration for each decomposed submodel. The prediction model is then involved with a multistep strategy to forecast the load for each sub-model using QRF, and the decomposed segments are reconstructed back for a compete for prediction model probability density with the help of kernel density estimation (KDE). The parameter optimization for the dataset obtained from real load data of Henan (province of China) was dependent on the BOA-based Tree-structured of Parzen Estimator (TPE).
Muhammad Qamar Raza et al. [80] considered a threelayered feed-forward ANN (FF-ANN) for a week ahead load forecasting over a year and utilized global best PSO (GPSO is used for NN weight bias values optimization) as a training technique for performance enhancement of the forecasting model. The model input comprises meteorological, exogenous, and load data tested and validated over the New England grid (ISO) for performance measurement. NN's designed architecture consisted of 8 input, 20 hidden, and 1 output layer and tuning parameters included no. of particles, time interval, G and P best components. I. P. Panapakidis [81] have directed the attention towards buses of transmission and distribution systems of urban, suburban and industrial load values of 10 buses (Thessaloniki, North-Greece) and have presented the busload more stochastic in nature in comparison to system load and have considered the busload attributes in his day and hour ahead load prediction model based on ANN and supported by the clustering technique for the hybrid formation of forecasting model resulting in efficient results.
Salah Bouktif et al. [82] have proposed an enhanced-LSTM model considering the periodic electrical characteristics using input time lag of multiple sequences to capture the sequence features for forecasting accuracy of aggregated load for achieving targeted predictions against time. The dataset and meteorological variables have been processed for null values or outlier detection for LSTM and gated recurrent unit (GRU) models. Also, Salah Bouktif et al. [42] have used LSTM-RNN deep learning method for STLF utilizing GA and PSO for hyperparameter learning and tuning. These selected parameters then are later used for final LSTM model construction and its training. The model is then compared with benchmarks of RF, SVR and other LSTM's. The dataset utilized was from RTE Corporation (French electric transmission network) consisting of a data frequency of 30 minutes over nine years.
Jian Zheng et al. [23] have used a long-term electricity consumption dataset and an airline data in comparison to verify the compatibility of LSTM-RNN models over other benchmark models (SARIMA, NARX, and SVR) to explore the dependencies of electrical load for a long time for accurate load forecasting.

E. OFF-GRID STUDIES
The grid-isolated power plants such as islands, remote communities and industrial operations are typically cut off from the infrastructure of the normal grid. Due to which they experience unique challenges. Such power plants have different operations due to their size; characteristically smaller and not as diverse compared to most market gird. Due to which, stability cannot be mostly managed with generations alone, reacting in load disconnections. Therefore, this section presents the studies, including offgrid power plants along with some SG studies for review as below: Yuanzhang-Sun et al. [83] have discussed the wind power efficiency for transmission over long distance and locally consumption presenting their challenges and solutions. They compared the two approaches for their utilization in the context of security, reliability, capability, and cost along with economic and technical feasibility in one of China's large isolated-power industrial power-system. The study also claimed the wind power to be an efficient utilization around the globe for isolated-super-micro-grid.
Danilo P. e Silva et al. [84] presents a hybrid model of optimization for isolated-microgrid. Considering forecast of meteorology model for ancillary services of purchasing or selling the power in a situation of requirement or access, respectively. The presented optimization algorithm follows specific rules for connection and disconnection with the primary grid (minimum time connection) considering the battery bank's operations.
Life Lozano et al. [85] have presented a case study for Gilutongan-Island (one of many off-grid Islands with limited electricity) in the Philippines, where a 194-kVA DG powers this off-grid island. This study presents a hybrid optimization model for electric renewables (HOMER) model, providing the analysis for adequate power generation on the island for 24/7 electricity provision.
Arslan Ahmed Bashir et al. [86] have presented an approach for economically reliable operations of islanded and gridconnected micro-grids (MG) utilizing a framework of energy scheduling and management of associated uncertainties with an islanded grid (IG) and connected micro-grid (C-MG). MG comprises various renewable energy sources (RES's) like solar, wind, tidal, biomass, etc., including Diesel Generators (DG). The proposed mixed-integer-linear-programming (MILP) model predicts a 24 hour ahead by input data updating considering minimizing MG's cost keeping the consumers comfort priorities (HVAC and EWH) while investigating the parametric impacts on cost and risk indices of MG over a year in IG and C-MG.
Asif Islam et al. [87] have performed an analysis for electrical load forecasting to accommodate an isolated island (Kutubdia, Bangladesh) with no historical load demand patterns. However, the analysis comprised driving factors considering the load growth of an area involving two methods: inverse matrix and regression analysis.
Shengxi Yuan et al. [88] have performed a case study on Sao Vicente (a small island of Cape Verde with high wind resources in the Atlantic Ocean), having an isolated electrical grid-primarily dependent on DG and have an integrated wind power (capacity of 5.950 MW). To achieve conservative forecasting, the study presents a wind-speed forecast utilizing ARIMA and energy storage. The study also discusses the advantages of renewable integration resulting in cost savings. The forecast was done on an hourly basis rolling mechanism and claims to have more noticeable results regarding reliability, fuel savings, and wind capacity.
Tanveer Ahmad and Huan Xin Chen [89] have examined the utilities, IPP's and Industrial consumers taking historical and climate data as inputs for energy usage estimation for the month, season, and year energy predictions on a large scale utilizing the non-linear autoregressive-model (NARM), linear-model stepwise-regression (LMSR), and RF-leastsquare-boosting (RF-LSBoost) methods on actual energy data and employing clustering analysis for outlier removals and detection. The model presented a city-wide energy demand in the form of energy prediction.
Azim Heydari et al. [90] have adopted the composite neural network and gravitational search optimization algorithm (GSA) for price forecasting and STLF in isolated power-grids. The constructed model has various internal sub-modeling (variational model decomposition, the mixture of datamodeling, selection of features, generalized regression NN and GSA) for price and load forecast. Power markets of Pennsylvania and Spanish electricity have been put to the test for the proposed model to observe the forecasting accuracy and later the Isolated Power-Grid data of Favignana's Island have been taken into consideration for their testing of the proposed model for winter, spring, summer, and fall. Ibrahim S. Jahan et al. [91] have presented a load forecasting model and data clustering techniques to improve electrical loads' forecasting accuracy. The collected dataset belonged to an off-grid platform of VSB-Technical University (Ostrava, Czech Republic), comprising meteorological and power data variables. The forecasting setup consisted of steps, out of which 1st involved the data clustering using k-means and k-medoids. Secondly, utilizing the square Euclidean equation, distances were measured, which improvised the dataset's features, and in the last step, the training and testing of the forecasting model were performed utilizing (DT) ANN.
Tolga Turai et al. [92] have taken a remotely located oil and gas facility (data of 160 days with different time interval) for a case study of electrical load forecasting in off-grid platforms considering the ANN (multi-layer NN's -Nonlinear Autoregressive NN model) as a prediction model taking 60 and 15-min data of electrical load. The two cascaded nonlinear autoregressive NN models (CNNARX) were utilized with different hidden layers topology to improve forecasting accuracy. However, compared with the 60 and 15-min, the 15 min interval produced better results.
Spyridon Chapaloglou et al. [93] have presented an algorithm for managing the energy smartly of an islandedpower-system with different power sources (DG, PV, and battery storage). The algorithm schedules comprise load forecasting, pattern recognition, and optimal power-flow customization to efficiently manage the system's power-flow. The L.F module depends on clustering algorithms and FF-ANN for the day ahead STLF. The results are treated as inputs to algorithm for pattern recognition for its classification according to the load curve. The study presented a methodology to improvise the smart flow of energy for efficient energy management and maximize the utilization of RES in peak demands and extend DG's life and save fuel costs.

F. COMPARATIVE STUDY TABLE
The summary of the works reviewed for this paper are presented in Table 4. The table emphasizes the methods utilized in residential, commercial, industrial, grid, and offgrid electrical load forecasting sectors. It presents the approach, discusses its strengths, and provides a review of advantages and disadvantages in the remarks section. In the table, the term "Acc" denotes accuracy, "Dur" denotes durability, and "Adapt" denotes adaptability. Additionally, the term "Pro" denotes processing time, and "Com" defines the complexity. Notice that we have utilized the scale in terms of "*, **, ***, and x" which symbolizes good, better, best, and no existence of the feature.

VI. LIMITATIONS OF EXISTING L. F MODELS
Different prediction models have different pros and cons. Some models' performance is limited to stationary time series, and some have good performance in non-stationary time series but can work with limited parameters. Whereas the data in DGM environment has totally different nature as it is produced with the generation of different sources affected by different parameters. The data in result has large variations in comparison to existing available data. Therefore, the existing prediction models without modifications stand obsolete in such type of environment.
Considering the global digitalization and clean energy production uprising the need of SG and implementation of smart city, such models are required which can deal in load forecasting with multiple features and can withstand the changes occurring in real time. The limitations of some of the existing models with suggestions to tailor them in accordance with DGM environment are presented in Table 5. However, these changes are not limited to presented remarks but could increase or decrease in accordance with the requirement.

VII. PERFORMANCE MEASUREMENT CRITERIA
Numerous researchers' overtime has used different criteria to evaluate their methods, models, and techniques in predicting load forecasting and in time new are on way to discovery; to quantify their results compared with actual load values. These indices justify the correction or precision of predicted values for their evaluation. The most famous performance measurement criteria for load forecasting among researchers are combined in Table 6. Where N is the number of samples, Y`i and Yi represent the models' predicted and actual desired values, respectively.
The metric values of these criteria vary concerning the difference in defined parameters and provided datasets. Each of the performance metrics has its pros and cons in terms of evaluation. Like RMSE focuses on more significant errors concerning small but has a second-degree loss function, MAE provides average error measurement, and MAPE can lead to biasness. Also, it observes denominator complications. Therefore, comparing the results of different methods with different metrics will not be as just as required. However, Table 7 is compiled with the available methods forecasting different modalities based on the MAPE % evaluation of different models from surveyed papers of different sectors.

VIII. OUTCOMES
The critical discoveries were drawn from this survey after scrutinizing, reviewing recent studies, and comparing state-ofart electrical load forecasting models. The outcomes are listed in this section under trends (presenting the findings) and research gaps (highlighting the possibilities of unexplored areas) in STLF.

A. TRENDS
The trends identified from the reviewed papers are listed below: • The forecasting accuracy of model variate with respect to the application, the change can cause inconsiderable amount of error. The change could be related to different application, sector, and region. • The correlational analysis of factors affecting the load forecasting have significant impact on model accuracy. However, these correlations are also dependent on regions, change in region will adversely affect the model accuracy. • Different papers have considered different meteorological variables relating to their application and region. Thus, there are no defined standard variables to be considered in load forecasting. • The STLF is considered more reliable for operations and planning of utilities. It is also considered to further expand the operations and contribute in MTLF and LTLF. • Historical load data contributes towards better model designing, resulting in better accuracy and model performance. • The missing data points, noise, and bad data cause error in predictions. However, they can be improvised by the help of respective algorithms. • Better accuracy results were observed from the datapoint received from smart meters having no bad data and noise. • Every model has its own merits and de-merits; however, hybrid methods were observed to have more performance efficiency in comparison to standalone methods. The hybridization of models with fine-tuned parameters had better performance.
• Due to the sector differences of power utility, the model, choice of parameters and algorithms, and the load pattern were observed to be different and followed a different cycle making every sector unidentical to be replicated. • The utilization of different clustering techniques was employed in many papers however the results of kmeans outperformed others. The recent introduction of HDBSCAN will also have improved results. • Extensive use of optimization algorithms was observed in load forecasting models. However, the GA and PSO optimization algorithms were mostly used due to their high-performance capability of crossover and mutation, and optimum fitness, respectively. • The existing models have limitations in adaptivity and in dealing with utilities having different generation modalities. • The load patterns of grid-isolated industries and with DGM, SG, and smart cities have a different nature due to their intrinsic nature of operations.  SVM out of the league of DGM. However, improved versions managing these issues could be considered.
[44] ARIMA The study comprised of scenario-based load forecasting where the time period was limited to reduce the peak consumption. However, the limitations of ARIMA include its linear nature and order selection process.
DGM have a complex structure of load demand and requires the exploitation of leading indicator along with explanatory variables which is limitation for such models.
[57] GA-PSO-BPNN The study removed the low-quality data and had missing data to re-fill. The study was conducted making the environment comfortable for model. However, the model needs validation on other datasets as well.
Owing to the fact of sensitivity, BPNN are sensitive to noisy data and performance depends upon input data. However, in DGM the data have marginal noise and input data keeps variating which can cause this model to perform poor. [65] NARX-ANN-LSTM The collected data was from smart meters resulting in no data irregularities and was a custom-design model. However, the load data was pre-maintained, and architecture was observed to be complex. Autocorrelation is not present.
The nature of DGM data is highly complex in comparison to industrial data where it can be separated. However, the introduction of multiple variables could imbalance the system. However, the modified model could be considered for DGM environment with inclusion of multivariable and autocorrelations.
[68] DMD-EVCM DMD corrects the load time series which is limited by EVCM putting the requirement of higher accuracy in historical data. Prediction accuracy varies inversely with the areas and accumulated errors have higher values.
The model experiences accuracy issues in small segments. DGM has multiple small segments which need to be dealt independently but the accumulated error accuracy will be large.
[77] LSTM-GRU GRU troubles with convergence rate, efficiency of learning and fitting problems. Whereas LSTM were considered to remove gradient vanishing problem, which still persists, it faces issues with model hyperparameters selection and is prone to overfitting.
Same model is not suitable for multiple type of dataset, similarly same parameters are not benchmark for other type of models, therefore an auto-tuning of parameters is required to tailor the model for DGM. Since specific regions have their dedicated inputs, therefore autocorrelation for optimized of inputs would also be required for DGM with appropriate settings to avoid tryand-see based configurations.
[94] BPTT-RNN BPTT has contributed towards efficient training for RNN, but BPPT is only suitable for offline training and RNN. However, this approach introduces gradient vanishing and exploding problems.
BPTT limitation with noisy data and online training will be greatly affected in DGM environment. RNN with modified version with reduced issues of gradient vanishing and exploding can be tailored for DGM but in combination with other algorithms.

B. RESEARCH GAPS
In this section, we present our observation leading to the available research gaps in electrical load forecasting.

1) CORRELATIONAL ANALYSIS
The existing models have set no standard for including the factors affecting load forecasting. Some studies have worked on 2 factors, and some have included more than 2. However, just few of the studies have considered relevancy and redundancy for inclusion of most relevant parameters. Therefore, inclusion of more parameters whether meteorological, economical, or regional with relevance to correlational analysis should be included in forecasting model.

2) MODEL ADAPTABILITY PROBLEMS
Since the existing load forecasting models are limited towards dedicated applications and cannot produce the desired results with the change of application. Therefore, it opens a direction to design a model with adaptability. No such models have been observed through-out the review of the papers to the best of our knowledge. Adaptive model is defined as a model which can evolve with changing conditions (input parameters, region, or application). Such models found the basis of load forecasting in different generation modalities environment.

3) ISOLATED-GRID FORECASTING
The load forecasting models have extensively explored the load phenomenon in grid connected sectors, the sectors isolated from grid have been neglected. This sector has different load behavior due to the change of structure and production capability. It faces heavy issues regarding load balancing and load surges causing the system to distress and produce randomly changing load patterns. This situation creates a challenging environment for power professionals to predict the future load demand. This random load behavior cannot be explored with existing developed models and need some more custom-designed parameters. Thus, the sector requires more exploration for accurate load forecasting for this intrinsic nature of load.

3) INDEPENDENT POWER PRODUCERS (IPP'S)
The independent power producers (IPP's) are also key sources for power production, which are not a public utility but contribute to power pool exchange. These utilities have a different pattern for power dispatch because it differs from the pool exchange position. Therefore, an accurate model for an exact node can help IPP's with their power management. This sector also needs immense exploration in terms of load forecast modeling because its nature is distinctive compared to others.

4) PROCESS BASED INDUSTRIAL LOAD FORECAST
The survey also revealed that the industrial load forecasting model based on processes could lead towards more accurate and better precise results. It can help to enhance the energy management, to supply the energy on pre-determined power consumption patterns based on model output and plan accordingly for facility enhancement. But the huge variations are observed with change in process, therefore the model needs to have parallel operational window to accommodate multiple forecasting operations for different processes at same time. Also, it can produce an accumulated results of these forecast at final stage resulting in better forecast.

5) ANOMALY BASED FORECASTING
Another fascinating area of further research includes load forecasting with anomaly detection. The transformation of the electrical grid from conventional to smart has opened a broad research area. Because a SG requires massive two-way communication, which is to be dealt with a large amount of data and integration of renewables gives it more complexity. Therefore, we firmly believe that the electrical modeling capable of forecasting with anomalies detection will further improvise forecasting. Since the load patterns in different generation modalities environment has multiple resources involved, it can create hinderances in accurate load prediction.

IX. CONCLUSION
Electrical load forecasting is the key component in power system operations. Different decisions related to energy transactions and management are based on it. Therefore, its accuracy and precision for decision-makers in the energy sector are utmost important. This paper has presented a detailed review of diverse load forecasting models relating to various periods and considered in distinct sectors. The hybrid methods were observed to have better forecasting accuracy, but still they had limitations. We discussed the conventional methods of load forecasting along with their performance and limitations. We discussed the importance of different generation modalities in existing power systems and presented an analysis to tailor the existing load forecasting methods for DGM scenarios (SG and Smart city). Finally, highlighting the limitations, trends, and research gaps will aid the readers for selection and evaluation of models as per their requirement and get future working directions. Future work will focus to enhance the limitations of existing load forecasting models, DGM and adaptability of load forecasting models.