Analysis of energy consumption using RNN-LSTM and ARIMA Model

Given the increase of smart electricity meters and the wide adoption of electricity generation technologies such as solar panels, there is a wealth of data available on the usage of electricity. This data represents a multivariate time series of variables related to power, which could in turn be used to model and even forecast future electricity usage. The Household Power Consumption dataset is a multivariate time series dataset which describes the electricity consumption over four years for a single household. They were tested to predict for a specific house and block of houses over a given period of time. Throughout the past couple of decades energy demand has increased exponentially. This increase loads the electricity distributors heavily. So forecasting future demand for electricity use would give the dealer an upper hand. Predicting the consumption of energy requires several parameters. This paper proposes two methods with one using a Recurrent Neural Network (RNN) and another using a Long Short Term Memory (LSTM) network, considering only the previous consumption of electricity to estimate potential consumption of electricity. To assess the applicability of the RNN and the LSTM network to predict the electricity consumption.


Introduction
The rapid increase in the consumption of electricity requires accurate forecasting of distribution of electricity usage [1]. To predict electricity use accurately, the electricity use had to be tracked. Advanced Metering Infra-structure (AMI), was then implemented. AMI contributes to a large amount of data on energy consumption [2]. AMI data is used for forecasting of the energy usage. Forecasting lets the national grid make choices about power delivery. A reliable electricity consumption forecast will prevent unplanned disturbances in the delivery of electricity [1,[3][4][5]. AMI provides the context for using the data for descriptive, predictive, and prescriptive analysis [2]. Energy demand is based on various factors, such as weather, occupancy, machine types, and appliances used. The dependence on a high number of factors has made forecasting techniques a lot difficult. For efficient distribution, accurate pre-dictions of electricity usage are essential. Nevertheless, adding all the variables that affect electricity consumption can generate an unstable and unpredictable complex fore-casting model [2]. Data-driven approaches for forecasting electricity usage are thus based on time series solutions [3]. Consumption of electricity is an attribute which depends on time. Therefore, there are approaches using time series to build the model to forecast electricity usage. The availability of past knowledge contributes to time series analysis based approaches as it represents the time-dependent variations [6]. The electricity consumption forecasts were described as short-term (hourly to weekly), mid-term (one week to year), And long term projections (more than one year) [2]. Techniques for evaluating the time series are discussed using traditional methods and AI-based methods (ANN, ARIMA, SVM, Fuzzy-based techniques) [3,5]. Past work shows IOP Publishing doi:10.1088/1742-6596/1716/1/012048 2 that these techniques do well for short-term forecasting but are weak for mid-and long-term forecasting [3]. Work on mid-term to long-term forecasting indicates relative errors approaching 40 percent -50 percent [1]. For mid-term and long-term electricity consumption forecasting, there are several challenges [3], and thus form the subject of this paper. This paper proposes two methods, one RNN and one LSTM, to forecast consumption of electricity for the short, mid-and long-term. The RNN and the LSTM were used to forecast daily, quarterly, and thirteen monthly electricity consumption. The RNN and LSTM are compared to the most general and popular predictive models of electricity consumption (ARIMA, ANN, and DNN). All models revealed that the root mean square error was minimized relative to the other models. The models were checked on the London Smart Meter dataset which is available to the public. The experiments were carried out on predicting both electricity consumption for individual houses and electricity consumption for a block of houses. The LSTM and the RNN achieved Root Mean Square Error (RMSE) averages 0.1 for all cases.

Related work
Smart grid data has been used for several tasks of forecasting energy consumption [2]. The data shall be viewed as sequential data. Most frequently studied is Autoregressive Integrated Moving Average (ARIMA) [7], Support Vector Machines (SVM) [8], linear regression [9], and Artificial Neural Networks (ANN) [10]. ARIMA is one of the most widely used methods for time series forecasting. ARIMA models used to forecast consumption in households and estimate demand for office buildings [11]. Alternatively, ARIMA has been used in Malaysia for short-term forecasting of half-hour consumption [12]. Model results showed strong success for short-term predictions [11]. ARIMA models have been established as a high performing short-term prediction solution [3]. SVM was also used in electricity consumption forecasting [8]. Nevertheless, the non-linear models showed better results for shortterm prediction [10]. The findings showed higher performance for ANN, a contrast with ANN, Multiple Regression (MR), Genetic Programming (GP), DNN and SVM; Results of the experiment showed that, given the limited amount of data, the DNN has provided comparable results with other tech-niques tested in the research [4].

Methodology
Deep learning is capable of learning from hidden patterns with no set of features and outperforms most of the machine learning and statistical methods to accomplish specific tasks [20,21][22][24]. Data from time series holds a sequential pattern in which the data holds co-relationships between parallel instances of data ( depends on −1 and effects +1). Recurrent Neural Networks (RNN), Long Short-Term Memory Networks (LSTM) and memory networks manage sequential data because of the brain's capacity to retain past information.

Dataset description
The Household Power Consumption dataset is a multivariate time series dataset which describes the electricity consumption over four years for a single household. This archive includes 2075259 measurements obtained from December 2006 through November 2010 (47 months) in a house located in Sceaux (7 km from Paris, France).
It is a multivariate series comprised of seven variables (besides the date and time); they are:  global_active_power: The total active power consumed by the household (kilowatts).
 global_reactive_power: The total reactive power consumed by the household (kilowatts).
 sub_metering_3: Active energy for climate control systems (watt-hours of active energy).

Data Preprocessing
To predict an individual home, data on electricity consumption is not pre-cessed, as it is important to concentrate on models that rely less on pre-processing. Data for a single house are separated from the electricity consumption of the blocks. The data had to be pre-processed as follows to estimate a blocks of electricity consumption. A block electricity use is estimated using all households electricity usage per day and the average daily electricity use. Growing house's time periods are not consistent over a given block. Therefore a specific time period is taken for block predictions, involving most of the houses in a given row. For the above defined time period the mean electricity consumption is determined for a given day allowing a wider collection of houses to measure the mean value. An example of calculated mean value for block 36 is given2. The mean value for the block will produce a single value to a given date. Predictions were made using the most popular deep time series, prediction models, using RNN and LSTM.  The first graph represents the mean global active power by year. The second graph represents the mean global active power by quarter. The third graph represents the mean active power by the month and the 4 th graph represents the mean global active power by day. The plots above confirmed our earlier discoveries. It was steady by year. The lowest average power consumption in the 3rd quarter was per cent. In July and August the lowest average electricity use was per month. The lowest average electricity use by day was around 8th of the month global active power by year. This time the year 2006 was removed.

Training and testing data
The dataset is broken down into data for training and research. The data from the study is kept separate from the training data. The test data is thus unknown to the model until the models are checked. The models were trained on 80% of all data, and evaluated on 20% of all data. Data from the training and testing are divided into three different methods. The training data was set up in order to understand an individual house's electricity usage and a block of houses.
Data from the training and testing are divided into three different methods. The training data was set up in order to un-derstand an individual house's electricity usage and a block of houses.

Figure 6. Forecasting plots
The above plots are of one day forecasting in each year. This is done for the blocks of individual houses and buildings. The model will be equipped to take the input of one day, and forecast the electricity usage of the next day. Giving a month, and forecasting the next month. Just 600 days are considered in order to get a good da-taset. This approach is extended to predictions of mean value for individual buildings, and blocks. Provided 3 months, and forecasting the next 3 months. Dataset balance is achieved with the use of 600 days. They were given three months to forecast three months ahead. Nevertheless, the data is split in order to prove that the models are capable of learning from a short, mid, and long term. Short-term projections are seen by forecasting one day ahead, mid-term prediction is seen by forecasting 3 months ahead, and forecast 13 months ahead indicates the longterm. The output (x t+1 ) is projected using input x t . For a single house or block of houses, x t can be one day, three months and 13 months of electricity use.

RNN
RNN is a model of deep learning which learns from sequences [6]. RNN takes the previous output (y t−1 ) recursively, and applies it to the new input (x t ). Use the y t−1 the RNN current output (y t ) learns from the previous sequence. Fig.1 demonstrates structure of one RNN array. The series is forwarded to the current input as input. Therefore, the past sequences will affect y t . The past sequences bear a com-bination on past outcomes. Thus, all the outputs are influenced by sequential knowledge and carry on in a given sequence. (1) provided an RNN equation showing the y t−1 re-cursion to the x t . The weight given to y t−1 is Wr, and the weight given to the x t is Wn. In our experiment, the RNN has 100 secret units sequentially linked to each other to achieve the best possible output.
= sigmoid (Wr y t−1 + ) (1) The RNN is trained for 300 epochs using a 20 batch size. Adam optimizer is used for the optimization to generate the highest accuracy. The RNN architecture. xt is the current input yt-1 is the previous output which is passed and combined with xt to create the final output yt.

Figure 8. Long Short Term Memory Network (LSTM) cell architecture
When RNN is compared to LSTM, LSTM has a complex architecture. Figure 8 shows an architecture of the LSTM cells, which includes three gates used to process and hold past data [17]. Passed data x t and added to h t−1 . The f t gate decides if the current output (h t ) is generated with C t−1 forwarded. Like the y t−1 carrying RNN, the LSTM determines which data is transmitted via the cell state (C t−1 ). In addition, unlike RNN, the LSTM cell includes the gate for input and the gate for output to construct the final output. C brings the stamp h t−1 to next time. Unlike RNN the C is determined by the LSTM, however. Therefore LSTM has control over the C and the h t instead of directly producing the output. LSTM cell is called a unit, and the unit can be related to each other sequentially. The LSTM has 100 LSTM units and one dense layer in the last layer before the final output is produced. The LSTM used to prepare for the best results in the experiment ran for 300epochs. The Adam optimizer was used to achieve the highest precision performance. With the LSTM 20 lot size was used. Figure 9 shows how the experimental LSTM structure is applied.  Figure 9, it is the structure of the LSTM or RNN. The model has 100 hidden layers. The model takes input and predicts batch. and +1 can be a day, month or 3 months.

Data
The research is performed using the data set3 of the London Smart Meter which is publicly accessible. This dataset features a recaptured version of the data stores in London. The dataset consists of5567meter real-world data of electricity use by UK households. The data set also contains weather for the dates given and information about each household (number of inhabitants, number of rooms, and so on). Only the energy consumption is the subject of the experiment. February 2014, data were collected. The meter data is split into pieces, depending on where the house is located. Each block has approximately 15-20 homes. From November 2011 until data may be used to predict the consumption of electricity or a block of houses in individual houses.

Results
Based on the prediction period, the results are classified into three categories: short-term, mid-term and long-term prediction. In addition, the forecasts are made for an individual house, a block of houses and a block of houses every day with a median value. The experiments were conducted to predict two group stages. They are predicting the consumption of electricity for a given household and predicting a block of consumption of electricity. The predictions are evaluated with the Root Mean Square Error (RMSE). RNN and LSTM findings are related to ARIMA, Artificial Neural Networks (ANN) and Deep Neural Networks (DNN), respectively.

Short term predictions
To show short-term forecasts, the models are used to forecast electricity usage for single house and a block of houses one day in advance.

Predicting electricity consumption for a given household
The randomly selected set of individual houses serves to demonstrate each model's success. The findings are listed in Table 1. ARIMA has demonstrated better results for short-term forecasts compared to RNN, LSTM, ANN and DNN. LSTM and RNN however also showed ARIMA very similar performance. Predicting electricity consumption using all the houses in a block data The first approach to estimating electricity usage was to consider the energy consumption of all the homes. To forecast electricity usage for the entire block, all the values from each house in the block are used. The findings are listed in Table 2.  Table 2, the results of the RNN and the LSTM did not perform in accordance with the outcome of the estimation of the individual houses. The mean consumption per day was determined to achieve better re-sults in predicting the electricity usage for a given block of houses. Table 3 shows the effects of predicting a mean daily consumption for each unit. Table 3 demonstrates a significant improvement over Table 2, which provides a simple insight into the energy usage in a block of houses every day.

Mid-term Predictions
Models are developed using data from past three months' electricity usage to forecast the next three months to demonstrate the capability of mid-term predictions. The same set of houses and blocks was used to be accurate when comparing the short, medium and long-term forecasts. Table 4 reveals that, contrary to the short-term predictions, ARIMA did not work as well as other approaches in mid-term predictions. All three deep neural networks (DNN, RNN, and LSTM) have shown ARIMA and ANN to perform well. This findings indicate that the simpler models have outperformed profoundly hidden ones.

Predicting electricity consumption using all the houses in a block data
Similar blocks, which were used for short-term predictions, are used here to be consistent. Table 5 reveals that compared to the estimation of individual household electricity usage, the deep learning models outperformed the rest of the models. How-The LSTM and RNN have also outperformed the DNN, too.

Predicting electricity consumption for a given household
The results of long term forecasts for a given household were shown in Table 7. RNN and LSTM are able to predict outcomes with greater precision compared to the other models.

Predicting electricity consumption using all the houses in a block data
Similar to the individual home, electricity consumption for the block is estimated using the same prediction 13 months ahead. The LSTM and RNN displayed outperformance for the rest of the models. The results also show that ARIMA has been outperformed by all the neural network models.

Conclusion
The paper focuses on models that can be used to forecast short-, mid-and long-term electricity usage for an individual house and a block of homes. The paper compares ARIMA, RNN, and LSTM by conducting experiments for the London Smart Meter dataset which is publicly available. Although ARIMA has demonstrated good performance for short-term forecasts, it is clear that as time increases Compared to other models ARIMA is not doing well. RNN and LSTM demonstrated a comparable performance to ARIMA for short-term predictions while outperforming all other models in mid-and long-term predictions. The Lon-don Smart Meter dataset makes it clear that the RNN and LSTM are capable of forecasting short, medium and long-term forecasts for high accuracy electricity usage.