Developing a Data Mining Based Model to Extract Predictor Factors in Energy Systems: Application of Global Natural Gas Demand

Recently, the natural gas (NG) global market attracted much attention as it is cleaner than oil and, simultaneously in most regions, is cheaper than renewable energy sources. However, price fluctuations, environmental concerns, technological development, emerging unconventional resources, energy security challenges, and shipment are some of the forces made the NG market more dynamic and complex. From a policy-making perspective, it is vital to uncover demand-side future trends. This paper proposed an intelligent forecasting model to forecast NG global demand, however investigating a multi-dimensional purified input vector. The model starts with a data mining (DM) step to purify input features, identify the best time lags, and pre-processing selected input vector. Then a hybrid artificial neural network (ANN) which is equipped with genetic optimizer is applied to set up ANN’s characteristics. Among 13 available input features, six features (e.g., Alternative and Nuclear Energy, CO2 Emissions, GDP per Capita, Urban Population, Natural Gas Production, Oil Consumption) were selected as the most relevant feature via the DM step. Then, the hybrid learning prediction model is designed to extrapolate the consumption of future trends. The proposed model overcomes competitive models refer to different five error based evaluation statistics consist of R2, MAE, MAPE, MBE, and RMSE. In addition, as the model proposed the best input feature set, results compared to the model which used the raw input set, with no DM purification process. The comparison showed that DmGNn overcame dramatically a simple GNn. Also, a looser prediction model, such as a generalized neural network with purified input features obtained a larger R2 indicator (=0.9864) than the GNn (=0.9679).

2 energy supply chain due to economic feasibility. Refer to International Energy Agency's (IEA) 2016 report, fossil fuels in the form of liquid fuels, natural gas, and coal contain more than 80% of the world energy consumption [4]. Easiness of utilization, higher performance, compared to traditional energy sources, ease of mobility via land or sea and affordable extraction cost introduced oil and natural gas (NG) as strategic commodities [5,6]. However, emergent ecological concerns and rethinking of a more peaceful future (sustainable development goals) attracted attention toward climate change challenges (such as greenhouse gases emissions and global warming) [7]. The two non-aligned objectives, on one hand, development and increasing needs for energy supply and on the other hand, global environmental concerns, attracted researchers to study energy systems and develop different plausible future perspectives.
Despite successful efforts, the main problem is still existing, which is defined as "discovering reliable future trends and probable alternative futures in the field of energy systems and uncover the most influencing driving forces to aid energy management process". This paper is aimed to develop an intelligent learning-based prediction model which is equipped with data mining (DM) techniques to purify and the setup input vector. The DM step is used to select and organize the best input features that represent patterns of future global NG demand trends. Although many previous studies successfully addressed NG global demand prediction problem, we attempt to uncover the most effective driving forces as input features and analyzing how they will affect the objective function (NG global demand prediction). For example, the proposed model studies time relation between input variables and the target variable. So a less-dimension input set is available to policymakers to simplify and experience reliable decision-making process.
As it is impressed by a series of variables and oscillating time series, the NG forecasting problem is a very challenging [8]. These days, massive efforts investigated artificial intelligence (AI) models or integration of several models (hybrid models) for prediction problems to increase the accuracy and the model reliability [9,10]. Also, numerous notable studies investigated by demand prediction for the case of energy resources [11][12][13][14][15][16]. The prediction performance of the CDA model overcame compared to the earlier neural networks (NN) and an engineering based model. Baumeister and Kilian published a research paper to analyze how vector autoregression (VAR) models form policy-relevant forecasting scenarios in the case of an oil market. The model investigates the influence of scenario weights' probability changes to the real-time oil price forecasting [17]. Also, Dilaver et al. investigated NG consumption in Europe to support long-term investments and contracts [18]. They estimated an OECD-Europe NG demand trends with annual time series during the period from 1978 to 2011 by applying a structural time series model (STSM). Finally, three scenario streams developed based on business as usual, high, and low case scenarios.
Li et al. used dynamic system models to create possible outlooks to 2030 for the case of China's NG consumption growth. Then to assess the results accuracy and propose policy recommendations on NG exploration and development of China's NG industry, a scenario 3 analysis step was applied [19]. Also, Suganthi and Samuel provided a comprehensive review of the energy model, which attempted to forecast the demand function [20]. Authors classified prediction models and presented that most of the recent researches contained quantitative models that result in a single future prediction. Models used statistical error functions to estimate, accuracy compared with other comparative models. However, as mentioned above, data-driven models may regret set of effective qualitative variables. In the other hand projecting alternative futures based on qualitative approaches are challenging, especially in the case of validation and moreover, they are extremely affected by the expert group (number of experts and judgment validation). To present a universal review and to dedicate insights about prediction approaches used by previous studies, table 1 summarized models used to address energy consumption prediction problem. Table 1. Analyzing previous studies, based on their approaches to address energy consumption prediction problem

Type of Models Pros & Cons
Classic price modeling/ • Focus on historical data.

forecasting
• Do not consider jumps and drips of the prices.
• Generally, these models were introduced for stock markets.
• Do not use the unit root test for time series and econometric methods to estimate their parameters.

Time series models
• Focus on historical data.
• Do not cope with extreme jumps and drips.
• Do not use feedback loops to dynamically upgrade model adjustment features.
Learning forecasting models • Focus on historical data.
• Generally being able to learn fluctuations and related formerly signals.
• Use feedback loops to upgrade model adjustment features dynamically.

Qualitative based forecasting models
• Rarely depend on quantitative forecasting methods.
• Donate insights about long-run behaviors of a complex system.
• Mostly depend on experts' evaluations, instead of historical time series.
• Can dynamically modify input features.
In this paper, we are aimed to propose a learning-based model, which is designed to present a more reliable and relevant input features (driving forces) to initialize a hybrid prediction ANN to equip decision-making process with accurate and reliable forecasts. Following section investigates the proposed methodology and brief descriptions of various steps, then section three is dedicated to presenting the implementation phase and discussing results to show how the proposed methodology overcomes other benchmark models. Finally, section four provides summaries and conclusions.

The methodology of research:
As noted previously, the following research is aimed to expand a data mining based prediction model. The main phases and steps of the proposed methodology discussed as follow: Step1. Data gathering: in this step, previous studies reviewed to detect raw input features. Unlike most of the previously published researches, this paper pursues the maximum approach, means we will gather and use maximum available input data to ensure that the developed model will not neglect a possible solution. In simple words, the proposed methodology does not limit the solution space due to the use of confined input featuresoutput: input feature set.

PHASE 2.
Step2. Feature selection: this step is designed to select the most relevant subset of the gathered features. The main target is to reduce problem dimensions while preserving all local optimal solutionsoutput: refined input feature sub-set.
Step3. Time lag selection: is investigated to study how different time lags for input features may affect forecasting accuracy. This step will study time relation between the input variable and the target variable. Output: timed input features sub-set.
Step4. Normalization: different scales of input features may cause in a biased final forecasting model. This step is aimed at reproducing input features but in similar, uniform scales. Output: uniformed timed input features sub-set.  Step5. Design of the forecasting model: in this step, an ANN is equipped with a GA in order to optimize the network's characteristics and develop an accurate prediction model. Output: prediction framework PHASE 4.
Step6. Implementation: finalized input features applied to the prediction framework. In this step, the input set divided into two main portions, one to train and other to test the performance of the prediction framework. Outputs: adjusted prediction model & obtained extrapolated results.

PHASE 5.
Step7. Validation: this step dedicated to comparing the obtained results of the proposed prediction framework with other benchmark comparative models. Output: output/accuracy analysis To model complex systems (like ours), selecting a robust model architecture is very challenging [75,76]. The DM approach is selected to handle the complexity of input variables. DM is defined as the process of extracting appealing patterns and deriving knowledge in massive datasets. So, as Han et al. noted: "the principal dimensions are data, knowledge, applications, and technologies" [77].

Data gathering and data pre-processing:
Input data remarkably affect the accuracy and quality of the obtained results. In the case of energy consumption, previous researches investigated different sets of input features to predict energy consumption's upcoming trends. A significant limitation of a prediction model is that it cannot reflect effects of variables which did not exist in the input feature set (those have been neglected). To ensure robustness and the validity of the proposed prediction model, the paper proposes the maximal approach, which means to investigate all available input data and reduce dataset dimension through a DM technique. This approach has the advantage of retaining all signals and trends while simultaneously, the model faces an undeniable challenge that is the increased complexity level due to the large input set which may negatively affect prediction efficiency. In another hand, it is a challenging process to set up strategic decisions based on an extensive collection of parameters/inputs. To handle the noted problem a DM based data pre-processing step is proposed by this paper to examine and purify input features. Table 3 summarizes the most frequently used input features (by other researchers) and the features which were available/accessible online. In machine learning problems, it is very challenging to select a representative collection of features to build the model [93]. Studying more features (a larger feature set), helps to explore more problem dimensions and to reduce the threat of missing potential solutions, but at the same time it may conclude more computational complexity, learning algorithm confusing and over learning.
DM, as a process, generally contains data cleaning, integration, selection, and transformation to discover patterns, evaluate them, and present the extracted knowledge [77,94]. In knowledge discovery processes, such as DM, the feature subset selection is very crucial, not only for the insight achieved from determining variables, but also for the upgraded reprehensibility, scalability, and the validity of the constructed models [95]. This research uses a correlation-based feature selection (CFs) algorithm to determine the most relevant input features. CFs was initially proposed by Hall in 1999 [93]. The key idea of CFs is the high correlation rate among features and the prediction class (target variable), yet selected features remain uncorrelated with each other [93]. "Best First" [96] and "Greedy Stepwise" [97] searching methods were applied to the CFs to study input dataset using various searching paradigms. Both of searching methods resulted in the same feature subset which means they support each other. Finally, through 13 representative input features (presented in table 3) 6 input features selected as the model's input, contains: (1) alternative and nuclear energy, (2) CO 2 emissions, (3) GDP per capita, (4) urban population, (5) NG production and (6) oil consumption.
Sometimes important features in a time series dataset show their influence with lags of time. Also, there would be time lags for a policy/decision in the complex energy market. Detecting related lags would assist a prediction model to accurately follow possible fluctuations [76]. At this step, the proposed DmGNn methodology attempts to determine time lags related to finalized feature subset correlated with the target attribute (i.e. NG global demand).

8
Numerous lag selection approaches exist that contain lag selection as a pre-processing, postprocessing, or even as a part of the learning process [98]. Among popular statistical tests based on information criteria pre-processing lag selection methods, Akaike information criteria (AIC), Bayesian information criteria (BIC) and Schwarz Bayesian information criteria (SBIC) are well used [99,100]. Information criteria methods consider 1 lag (as the minimum number) to p which define intermediate lags. The main hypothesis is to define the lag order p to minimize the following equation:  Although an optimum set of input features have been selected, still input features are asymmetric and the units are different in scales. Data normalization step is investigated to restrain the parameters range influence on the results and adapt values of different features with different domains and scales to a shared scale. The "min-max" normalization method is used to adjust dataset using the following equation: Normalized Data ( ( ) min{ }) / max{ } min{ } y i y y y = − − 10 Where y(i) is an i th element in the column and min{ } y minimum and max{ } y is the maximum of related column's elements.
The next sub-section is dedicated to discussing the forecasting framework.

Artificial neural network:
Computational intelligence methods such as an artificial neural network (ANNs) [102] are modern paradigms to handle complex optimization problems [103][104][105]. ANN is organized as a simplified abstract of the biological nervous system to emulate neurons mechanism. A neuron is the computation unit of an ANN. Mathematically a neuron is a function, which aimed at dynamically reduce deviation cost. The mathematical description of a neuron presented as follows: Where xi and oj respectively are the input and the output at time t, ij  defines the delay between xi and oj. Tj presents the threshold of the j th neuron, while wij is the connection coefficient from neuron i to neuron j. An ANN consists of characteristics: the input layer, the hidden layer, the interconnection between different layers, the learning step to find the optimum values of interconnections weights, the transformer function which assigned to produce outputs refer to weighted inputs, the number of neurons performing in each layer and the output layer. Fig 3 schematically presents the architecture of an ANN with a single hidden layer.

Input Layer
Hidden Layer(s)

Fig3. A simple artificial neural network
As it has been shown in the fig.3 neurons are deployed in layers. Nodes of layers in row are connected to show interactions and information flow in an ANN. The connection between node i and j defines by the weight wij and also a bias bi parameter is assigned to each neuron [106]. To minimize the error at each step (which is known as epoch) an ANN compute and error function and uses an algorithm to reduce the error value.

11
An ANN has the ability to be trained in order to build a precise network and minimize the lost function via adjusting wij weight matrices [76]. So, the performance of learning algorithm will define the performance of the ANN. In this paper, genetic algorithm (GA) is used to equip ANN as the learning algorithm. In the next section, GA procedure is explained briefly.

Genetic algorithm:
Training an ANN is very complex which can directly influence outcomes' quality. Recently, numerous academic studies are presented which applied meta-heuristic and intelligent algorithms (i.e. GA) as learning algorithms [107].
GA is an evolutionary optimization approach developed by Holland in 1975 [108] which acts based on random search procedure. Compared to traditional optimization methods the GA has numerous advantages. For example the algorithm converge to a good, feasible solution faster than other existing traditional methods [11]. Series of computational operators like selection, mutation, and crossover functions are used in a GA to achieve a reliable solution.

Selection
Select Solutions from the generated solution set (refer to step 2) to exchange genes following crossover strategy.

Crossover 4. Mutation
Select Solutions from the generated solution set (refer to step 2) to exchange genes following mutation strategy.

Evaluation Compute fitness value F of each solution; Find and
save the best solution D best of the n th iteration

Output
The best solution: D best =min{D best of the n th iteration, n=0 .. N x }

Genetic neural network:
In this paper, weights and thresholds of the ANN are updated by a GA. For this purpose, input vectors transformed to a genetic gene in the format of the chromosome. Then, the initial population is formed from the randomly generated chromosome. Now values of the 12 optimization algorithm such as selection, crossover, and mutation rates can be set to design the algorithm. The fitness function is the reciprocal of the quadratic sum of the difference between predicted and real values [109]. Roulette wheel selection is used to select a new individual, then two chromosomes are exchanged via crossover operation to generate a new individual. Finally, mutation step is applied to avoid premature convergence.
Equipping an ANN with a GA could save training time and improve the precision of the forecasting model [109].

Fig5. Flowchart of a GNN
Next section is dedicated to present the architecture of ANN, which is the basic framework of the developed forecasting model.

The architecture of the ANN:
This research targeted to present an accurate NG demand predictions, so the selected features were inputted at the initiatory layer (input layer) of the designed ANN. A single hidden layer 13 network was designed to perform the prediction so the model contains a three-layer architecture. Fig6 shows the performance of a three-layered NN for three, four, five, six and seven neurons in the hidden layer. Four neurons were used for the hidden layer as it returns the best performance among other tested number of neurons (see fig6).

Fig6. Performance of DmGNn for different numbers of neurons (A: R2 statistic for the different number of DmGNn neurons; B: RMSE statistic for the different number of DmGNn neurons)
As it has been represented in fig 6, based on the R 2 and root mean square error (RMSE) statistics, four number of neurons the proposed data mining genetic-neural network (DmGNn) model performs better than other examined set.

Outputs and Results:
As mentioned before this paper is aimed at developing a forecasting model to accurately forecast global NG demand.

Fig7. Performance of the proposed DmGNn model for the training and testing data sets
Learning models were extensively applied in the case of NG demand predictions [49,110]. Some competitive prediction models were selected to compare outputs of the proposed model and analysis of the accuracy. Adaptive Neuro-Fuzzy Inference Systems (ANFIS) [111][112][113] and a set of classical well-known neural network based techniques such as: Radial Basis Function Neural Network (RBF) [114,115], Multi-Layered Perceptron (MLP) [116,117] and Generalized Regression Neural Network (GRNN) [118][119][120] are nominated and optimized (through trial and error processes) to prove the accuracy of the proposed DmGNn model through a comparison study.
To evaluate different models, a set of mathematical criteria organized to measure prediction performance. A relatively large set of validity indicators support the justification of a model usage [8]. These statistics are summarized in Each model ran for 10 times and the average of outputs was calculated. Table 5 presents the performance of the proposed and competitive models refer to statistics introduced in table 4.

Table5. Statistical errors for each prediction model
As it is shown in table 5 the proposed DmGNn significantly outperforms other competitive models. The pattern of the absolute error for each model is shown in Fig 8, which represents how various forecasting models behave along the test period. As it is shown the proposed DmGNn outperforms other benchmark forecasting models (with lower absolute error value for forecasting period) and resulted in a robust forecast series (unlike other forecasting models DmGNn's forecast errors showed a low swing pattern). To show the efficiency of the data mining phases, both pre-processed and raw data were applied to the design forecasting model. Fig 9 and table 6 compared the results.

Conclusion:
Energy is a major topic both in practice and theory which many researchers investigated issues related to energy sectors and industries. The international energy supply system is characterized by a complicated combination of technological, social, economic and political elements. Predicting and planning for future global energy market is an interesting and simultaneously a challenging subject in both research and practical investment projects. Thus accurate prediction of energy demand is critical to developing future policies, modify current plans and evaluate potential strategies. This paper primary targeted to provide an accurate and robust prediction model to predict the global natural gas demands. In other hand authors aimed at introducing a process which reduces problem space dimensions to define the most relevant features which affect NG future consumption trends. So policymakers can monitor and manipulate NG market refers to extracted features.
In order to investigate maximum feasible solutions and to prevent missing any potential optimal solution, all available input features were gathered based on the literature review and related online dataset survey. Input features would define the model structure and support the accuracy of the output results. Although, increasing in the number of input variables may cause computational complexity and reducing interpretability of the results. Instead, a large number of input features expands solution space and consequently reduces the probability of ignoring appropriate answers. A feature selection step is proposed and is implemented to reduce the dataset dimensions while guarantees that the prediction model will explore all optimal solutions. Finally, 6 input features were selected among 13 primary input features. The feature selection approach guarantees to investigate all solution space using a limited set of input features. Then possible time lags among input features versus the targeted attribute (NG global demand) were studied and subsequently applied to the refined input set.
Investigating suitable time lags will cause in a more accurate and rational prediction model, which guarantees synchronization between input features and the target attribute at t time step. Finally, a neural network framework is developed which equipped using a genetic algorithm to optimize the network's characteristics aimed to predict future NG global demands.
Four comparative models are investigated to study the performance of the proposed data mining genetic-neural network (DmGNn) model. The proposed DmGNn model outperforms other benchmark models refer to 5 different error statistics. Based on the R2 statistic the DmGNn track real testing set fluctuations very well (only missed about 2%). Moreover, to distinct how the proposed pre-processing step affects the model accuracy, DmGNn model compared to a single GNn (without pre-processing step). As shown the proposed preprocessing step improves predictions both in term of accuracy and reliability (robustness). Moreover, based on the interpretative capability index, the DmGNn dedicates a more clear vision about future trends since it uses a smaller input dataset. A limited input feature set enables decision makers to design responsive policies/strategies/actions as they aware of attributes affecting the global NG demands.
The proposed DmGNn is characterized by high flexibility, universal operation, learning ability and low requirements for computation resources. As a result, it can be used by decision makers and market participants who face a complex environment.