A comparative analysis to forecast carbon dioxide emissions

Despite the growing knowledge and commitment to climate change, carbon dioxide (CO 2 ) emissions continue to rise dramatically throughout the planet. In recent years, the consequences of climate change have become more catastrophic and have attracted widespread attention globally. CO 2 emissions from the energy industry have lately been highlighted as one of the world’s most pressing concerns for all countries. This paper examines the relationships between CO 2 emissions, electrical energy consumption, and gross domestic product (GDP) in Bangladesh from 1972 to 2019 in the first section. In this purpose, we applied the fully modified ordinary least squares (FMOLS) approach. The findings indicate that CO 2 emissions, electrical energy consumption, and GDP have a statistically significant long-term cointegrating relationship. Developing an accurate CO 2 emissions forecasting model is crucial for tackling it safely. This leads to the second step, which involves formulating the multivariate time series CO 2 emissions forecasting challenges considering its influential factors. Based on multivariate time series prediction, four deep learning algorithms are analyzed in this work, those are convolution neural network (CNN), CNN long short-term memory (CNN–LSTM), long short-term memory (LSTM), and dense neural network (DNN). The root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) are used to analyze and compare the performances of the predictive models. The prediction errors in MAPE of the CNN, CNN–LSTM, LSTM, and DNN are 15.043, 5.065, 5.377, and 3.678, respectively. After evaluating those deep learning models, a multivariate polynomial regression has also been employed to forecast CO 2 emissions. It seems to have nearly similar accuracy as the LSTM model, having a MAPE of 5.541. © 2022TheAuthors.PublishedbyElsevierLtd.ThisisanopenaccessarticleundertheCCBYlicense (http://creativecommons.org/licenses/by/4.0/).


Introduction
One of the most burning concerns confronting modern civilization is environmental issues.The principal cause underlying climate change is the production of greenhouse gases (GHGs), which mostly comprise carbon dioxide (CO 2 ) (Mitić et al., 2017).The effects of climate change can now be observed in every corner of the globe.As a consequence of global warming, Bangladesh is one of the most vulnerable countries to this climatic change (Sarkar et al., 2015), and due to the environmental concerns, Bangladesh is anticipated to decline by 3.4% of its 2015 gross domestic product (GDP) every year (Hasan and Chongbo, 2020).About 75% of the human-caused CO 2 in the last 20 years originated from fossil fuel combustion.However, fossil fuels comprise the dominant source of electricity, manufacturing activity, transportation, and consumption of goods and services, all of which autoregressive distributed lag model (ARDL) and Johansen cointegration were used to examine the relationship between CO 2 emissions, energy consumption, and economic progress both in bivariate and multivariate models for 1971-2019 in Pakistan.These findings indicate a positive and significant impact of a fastgrowing economy and energy consumption on CO 2 emissions.Khobai et al. (2017) reviewed the significant correlation between GDP and energy usage in South Africa.According to their document, there is a two-way causal determinant of economic progress and energy use.In Turkey, Gökmenoğlu and Taspinar (2016) discovered one-way statistical relationship between economic progress and energy usage.Ghosh et al. (2014) scrutinized the interconnection between economic growth, CO 2 emissions, and energy consumption in Bangladesh and found that energy consumption has a positive and significant influence on economic growth, but CO 2 emissions have a negative and minor impact.Using fully modified ordinary least squares (FMOLS) and Johansen cointegration approach, Mirza and Kanwal (2017) found population density and energy usages had a substantial influence on environmental degradation in Pakistan.To check the robustness of these literatures exhibit that Pakistan's population density and energy consumption contribute to CO 2 emissions.For the country of Bangladesh, Sarkar et al. (2018) investigated the trends in energy consumption and CO 2 emissions.It has been proven that the rising rate of CO 2 emissions in Bangladesh is higher than the increasing rate of GDP and energy consumption.Cai et al. (2018) utilized the ARDL limit test, to determine the relationship between renewable energy usages, economic growth, and CO 2 emissions for a group of seven (G7) countries.When real GDP per capita and CO 2 emissions are employed as dependent variables, it shows that cointegration exists in G7 countries.From 1997 to 2016, Salari et al. (2021) inspected the relationship between CO 2 emissions, energy consumption, and GDP at the state level in the United States.For both static and dynamic models, the results show a long-run relationship between various types of energy consumption and CO 2 emissions.Wasti and Zaidi (2020) examined the affiliation between CO 2 emissions, energy consumption, GDP, and trade liberalization in their research of Kuwait.This statistic reveals unidirectional causation between GDP and CO 2 emissions and between energy usage and trade liberalization.A study by Valadkhani et al. (2019) examined the contribution of several main energy sources (oil, coal, natural gas, hydroelectricity, and other renewables) to global CO 2 emissions.In 11 Asian states from 1960 to 2014, Rahman (2017) analyzed the correlation between CO 2 emissions, energy consumption, economic development, exports, and population density to evaluate the long-term consequences on CO 2 emissions by FMOLS and dynamic ordinary least squares (DOLS) techniques.These five factors appear to be cointegrated throughout time, based on the test findings.The research by Shahbaz et al. (2013) in Indonesia, Ang (2008) in France and Malaysia have exposed that economic expansion affects energy consumption and CO 2 emissions.Much recent researches, such as Saint Akadiri et al. (2019), You and Lv (2018) have used the KOF index to investigate the effect of globalization on CO 2 emissions.In addition to these, a summary of some recent studies on CO 2 emission is shown in the Table 1 below: On the basis of these literature, CO 2 emissions can be influenced by a variety of factors, and it is important to forecast CO 2 emissions.As a result, various studies have recently been developed new models to predict CO 2 emissions, and some of them are on the way.According to Li et al. (2020), China has paid close attention to model and forecast China's CO 2 emissions since the Copenhagen climate summit.Various viewpoints and forecast outcomes have emerged on China's ability to meet its pledge to reduce CO 2 intensity.Yuan et al. (2012) prediction model showed that CO 2 emissions would be reduced by 45% if China maintains annual economic growth rates of 7% and 6% in the 12th and 13th five year plans (FYP) periods, respectively.Using the logarithmic mean divisia index (LMDI) decomposition method, Islam et al. (2021) predicted that CO 2 emissions in Bangladesh would peak at 58.97 Mtoe by 2040.Heydari et al. (2019) used a general regression neural network (GRNN) and grey wolf optimization (GWO) to assess the trend of CO 2 emissions in Iran, Canada, and Italy, and the findings show that the suggested method is more accurate in long-term CO 2 emissions forecasting.Fang et al. (2018) utilized an enhanced Gaussian processes regression model to forecast CO 2 emissions, and he discovered that China's overall CO 2 emissions would continue to rise, but at a slower rate, and that the US and Japan will have a good handle on their CO 2 emissions soon.Hosseini et al. (2019) used time series and regression analysis to forecast Iran's CO 2 emissions in 2030.According to their findings, Iran is unlikely to keep its goal to the Paris agreement based on the business as usual (BAU's) assumptions.Moreover, Ofosu-Adarkwa et al. (2020) projected CO 2 emissions from the Chinese cement sector, and their proposed system (hybrid Verhulst-GM (1, N)) can forecast emissions with a 97% precision.According to Ameyaw et al. (2019), the long short-term memory (LSTM) based CO 2 from combustion projection in China depicts a declining trend up to 2030.Their analysis also stated that, if renewable energy investments are not increased, countries' intended nationally determined contributions (INDCs) will be jeopardized.Li (2020) utilized LSTM, support vector machines (SVM), and kernel least squares (KLS) models to forecast CO 2 emissions in China, with the findings demonstrating that KLS was more accurate than other current techniques.For the first time in China, Huang et al. (2019) used LSTM to undertake grey relationship analysis, principal component analysis for CO 2 emissions forecasting.Amarpuri et al. (2019) in India, a deep learning hybrid method was used to predict CO 2 emissions.A deep learning hybrid model convolutional neural network-long short-term memory (CNN-LSTM) was utilized to make the prediction.To anticipate CO 2 emissions, Fatima et al. (2019) utilized simple exponential smoothing (SES) and autoregressive integrated moving average (ARIMA) models.They discovered that the ARIMA model is appropriate since it has the lowest fractional mean absolute error (FMAE) value.

Research gaps and contributions
Numerous studies have investigated the CO 2 emissions from a variety of perspectives, such as a causal relationship of CO 2 emissions and the factors that may or may not have influences on it.However, there are still some rooms to work on it, and some factors which motivate to do this work, e.g. the data samples which were used before being too small, not taking into consideration of some extremely crucial influencing factors, lack of focus on Bangladesh.It is worth mentioning that, Bangladesh badly needs the investigation of CO 2 forecasting because, in 2019, it has produced around 0.66 tonnes of CO 2 per capita, which was only 0.05 tons per capita in 1970, that indicates an annual average climbing rate of 5.48% (Anon, 2022).Taking into account all those influential factors on CO 2 emission, the problem formulation leads to multivariate time series forecasting methodologies, which are difficult to execute but provide more accurate results.Time series forecasting issues have been explicitly addressed by machine learning and regression techniques in the literature, however, to the best of the authors' knowledge, machine learning or deep learning approaches have not extensively been used to anticipate CO 2 emissions considering those influential factors.To fulfill the research gaps, this paper have evaluated several deep learning algorithms' efficacy, along with a multivariate polynomial regression.The study will have a significant Note: →: Unidirectional causality, ̸ =: not equal causality, GDP: real growth, TO: trade openness, EN: energy consumption, CO2: Carbon dioxide emissions, FD: financial development, FDI: foreign direct investment, REN: renewable energy, HDI: human development index, IG: industrial growth, PD: population density, EG: economic growth, EF: economic factor, MINT: Economies of Mexico, Indonesia, Nigeria and Turkey, OECD: The Organisation for Economic Co-operation and Development, GCC: The Gulf Cooperation Council, (-) and (+): negative and positive relationship.impact on CO 2 emission policies since it is being undertaken in the world's eighth most populous nation along with a number of environmental issues taken into consideration.This study has the following contributions to the existing knowledge: • Bangladesh suffers from a lack of environmental awareness, notably in terms of CO 2 emissions.This study examines the dynamics and causal linkages among the level of CO 2 emissions, energy consumption and GDP in Bangladesh for the period of 1972-2019 applying FMOLS cointegration technique.
• Despite the fact that similar indicators have been studied in the past, there have been very few studies conducted on Bangladeshi data.More crucially, Bangladesh has received very little attention from scholars utilizing the FMOLS approach to analyze similar variables.This research adds value into the current literature by incorporating deep learning to forecast CO 2 emissions, especially in the case of Bangladesh.• Forecasting CO 2 emissions using deep learning models while accounting for GDP and per capita energy use, i.e. multivariate time series forecasting.This is an important consideration of this research owing to the higher accuracy from practical perspective.
• Analyzing the forecasting result using a multivariate polynomial regression is performed.The empirical and forecasted findings of this research will offer policymakers with a better knowledge of the relationship between energy consumption, CO 2 emissions, and economic growth allowing them to develop energy and climate policies to improve the environment by reducing CO 2 emissions.
The paper is organized as follows.The data and methodology part is shown in Section 2. Section 3 explains the experimental setup, data preprocessing, model configuration, accuracy metrics, and the experimental results.Single-step ahead forecasting of CO 2 emissions is carried out in Section 5. Section 6 is reserved for a discussion on the findings as well as some perspective research directions.Finally, Section 7 outlines the conclusions and policy implications.

Variables
Global warming has now become a major environmental issue, and it is mostly caused by greenhouse gas emissions, particularly CO 2 emissions, which have increased dramatically in recent years.CO 2 emissions are related to both energy consumption and national economic growth.Bangladesh has employed a variety of energies, but for this study, we will only consider electrical energy, which we have referred to as energy consumption.So this paper examines the relationship between CO 2 emissions, GDP, and energy consumption, where CO 2 is employed as a dependent variable.GDP and electrical consumption are used as independent variables to derive the long-run relationship between these variables.This paper has been carried out by utilizing Bangladesh's yearly time series data from 1972 to 2019, and all the necessary data for this work has been collected from the world development indicators (WDI) 1 (Anon, 0000).
1 The authors are happy to share the data set upon request.
Fig. 1 shows the graph that indicates both the independent and dependent variables that have been examined in this paper, where EL_C stands for electrical energy consumption (KWh) per capita; meanwhile, GDP and CO 2 emissions are the other two indicators.Compared to the other metrics, GDP exhibits to be a rising exponential trend.The data of CO 2 emissions ranges from 0.047 to 0.598 tons per capita, which are fraction values, according to the data source the values of GDP and electrical energy consumptions are in the range of two to four digit decimal numbers.Nonetheless, to deal appropriately with the other two components, the unit of CO 2 emissions has been changed from metric tons per capita to kg per capita for convenience.

Unit root testing
Prior to assessing the long-run interconnection of multiple variables via cointegration in applied time series econometrics, stationary data must be gathered and evaluated (Yuping et al., 2021).To establish a long-term relationship between variables, they must be stationary at the first difference.The unit root test is performed to determine whether a variable is stationary or not.In this work the augmented Dickey-Fuller (ADF) and Philips-Perron (PP) tests are used to carry out the unit root test.Table 2 shows the outcomes of the ADF and PP tests.
From Table 2, the ADF and PP test findings have shown that all the variables, i.e., GDP, CO 2 and electrical energy consumption are non-stationary initially.After encountering the first difference, those variables have become stationary since the related variables' probability (P) values are less than 5%.There are several techniques available for estimating cointegrating relationships.Therefore, the following sub-section explains the state-of-the-art methodologies employed in this empirical investigation.

Johansen cointegration test
Since the variables are stationary at the first difference, the unit root test allows us to conduct the Johansen cointegration test.The purpose of this test is to observe the long-term relationship between them.We have assumed that if no cointegration exists between the dependent and the independent variables is treated as the null hypothesis.In contrast, if there is cointegration will be considered as the alternative hypothesis.

Cointegration test results
Before performing Johansen cointegration test, it is necessary to calculate the optimum lag length.Akaike information criterion (AIC) and Bayesian information criteria are used to determine the optimal order of lag in the model, as described by Pesaran et al. (2001).Results of Johansen cointegration are reported in Table 3.Hence, according to the P-value, there is one cointegration among the selected set of variables at a 5% level of significance.
From Table 3, a single cointegrating vector is observed by trace statistics.We may reject the null hypothesis since the trace value 44.8247 of rank zero is greater than the 5 percent critical value of 35.0109.Since the maximal eigenvalue of 27.5990 for rank zero is higher than the 5 percent critical value of 24.2520, a similar result has been observed in eigenvalue statistics.The Johansen outcome indicates a long-term connection between dependent and independent variables.That means long-run associations exist among them and a high impact on CO 2 emissions of GDP and electrical energy consumption per capita.To assess the impact on CO 2 emissions of explanatory variables, i.e., electric energy consumption per capita and GDP, the FMOLS test is applied in the following subsection.

FMOLS estimation
The FMOLS technique gives credible estimations for small sample sizes and provides a robustness test on the results.The FMOLS technique, established and developed by Philips and Hansen (1990), is used to estimate one cointegrating relationship that contains a combination of I(1) (Bashier and Siam, 2014).After establishing a cointegrating relationship between the variables, it is necessary to estimate the long-term dynamics among the variables.The FMOLS approach would be utilized in the following step to estimate long-term elasticities.The FMOLS assesses the impact of GDP and electric energy consumption on CO 2 emissions after quantifying the long-run association among the variables in the previous section.
The outcomes suggest that electric energy consumption has a positive and remarkable contribution to increase CO 2 emissions.Nevertheless, the coefficient observes that there is an adverse consequence of GDP on CO 2 discharge shown in Table 4.Following the results of the data, it appears that a decrease in GDP increases CO 2 emissions in a very slow and gradual manner.According to the result of FMOLS, one unit increase in electric energy consumption causes a 1.526 unit increase in CO 2 emissions.Again, one unit expansion in GDP causes a 0.142 unit decline in CO 2 outflow.The P-value for GDP and the electric energy consumption is substantial at the 5% level, and both factors are significant for CO 2 emissions.
The previous subsection detailed the estimation for the cointegration models of per capita CO 2 emissions for Bangladesh.The next part will elaborate the future projection of CO 2 emissions using multivariant time series, which will consider the GDP and electric energy usage.They have a significant impact on CO 2 discharge over a period which is proved by the FMOLS result.To correctly estimate CO 2 emissions, we have used deep learning algorithms, and the analysis has been undertaken using a variety of accuracy metrics.

Forecasting problem formulation
A time-series approach is used in this paper to forecast CO 2 emissions because conventional methods have several limitations (Hossain et al., 2021a).The time-series approach (X = x 1 , x 2 , . . ., x t ) is a sequence of observations in a time frame, where x i refers to the observation at time t and X is the total number of observations.A challenge may have two or more parallel input time series and an output time series that depends on the input time series.Here, GDP and energy consumption are input time series, whereas CO 2 emissions are considered as output.Using the first three-time steps of each parallel input time series as input features, the models then relate this value to the fourth time step of output time series as target variable.As our goal is to single-step ahead forecasting we took the fourth observation as our output feature.The input data can be of any previous data (such as 3, 5, 10), but we have found better results in the three input previous data.If it is multi-step forecasting then  additional time steps beyond four will be considered.This system works as a sliding window approach.We have discarded some values for which we did not have values in the input time series at previous time steps during the output time series.The Fig. 2 depicts a graphical representation of the problem formulation for the time series forecasting algorithm.

Long short-term memory (LSTM) neural network
The LSTM is an advanced gated memory unit that eliminates the vanishing gradient problems that restrict the effectiveness of a basic RNN (Li et al., 2021;Zeroual et al., 2020).Owing to its high long-term memory function, this neural network is capable of exploring in depth the long-term relationships and trends of limited data samples (Hossain et al., 2020).As an interpretation from the completion of the sequence, an LSTM layer must be fed a three-dimensional input, so that input data must be reshaped before being fed into it.
Fig. 3 illustrates the basic LSTM network construction, where the current variable vector x t , the previous output h t − 1, and the previous cell state C t − 1 are inputs to the LSTM cell.As seen in Fig. 3, the input gate (i t ), forget gate (f t ), output gate (o t ), and memory cell ( Ct ) are depicted by small boxes.They can be computed by using the following equations (Li et al., 2021;Zhou and Chen, 2021): represents the gate activation function, which here is a sigmoid function, and tanh() is the hyperbolic tangent function, as seen in the equations above.The cell output state C t and the layer output h t can be determined as follows: where ⊕ denotes the element-wise matrix/vector, multiplication operator.

Convolution neural network (CNN)
The CNN has been widely used in a variety of fields, including pattern classification, image processing, radiology, and model identification (Yu et al., 2020;Yao et al., 2021).In this paper, the one-dimensional CNN is utilized to extract characteristics of the complex interaction between nonlinear input and output data.Instead of conventional matrix multiplication, the convolution layer uses a mathematical convolution operator and cross-correlation.The convolution operation can be described in the following way (Hossain et al., 2021b): In the convolution layer, c l i is the i th element of l th feature, w l i is the i th element of the l th kernel, b j is bias, * is convolution operation and f 1 denotes the activation function.x i is the i th element of the input, e.g.GDP and energy consumption here.
Various pooling approaches are available on the CNN architecture, however, max-pooling is the most commonly employed after CNN layers.Using a pooling layer after convolution reduces the data dimension by combining the output of neuron clusters.

CNN-Long short-term memory networks (CNN-LSTM)
The CNN-LSTM architecture uses CNN layers for feature extraction on input data and combines them with LSTM to improve sequence prediction (Rajagukguk et al., 2020;Guo et al., 2020).In this paper, the 1D CNN-LSTM network has been employed to forecast CO 2 emissions.According to Fig. 4, there are fives layers in the 1D CNN-LSTM network architecture.Firstly, the original signals are transferred into the first 1D convolution layer for feature extraction and feature selection.The second layer is the 1D maximum pooling layer, followed by the 1D convolution layer, and this is followed by stacking the one LSTM layer.Finally, a fully connected layer with the relu activation function is placed at the end to predict.

Artificial neural networks (ANN)
The ANNs are data-driven, flexible models capable of approximating a vast class of nonlinear problems to any desired level of accuracy.A wide range of ANN models have been developed and widely used in different applications (Hossain et al., 2021b;Guo et al., 2021).The multilayer perceptron is one of the most extensively used ANN models in time series forecasting (Panigrahi and Behera, 2017).This paper's model architecture comprises an input layer, two hidden layers, and one output layer.
In a forecasting problem, the number of inputs and neurons in the hidden layer is configurable.In contrast, the output layer contains only one neuron with auxiliary components, such as weight, bias, activation function so on.With a multilayer perceptron, the fundamental neural network design is shown in Fig. 5.The input layer accepts input values, while the hidden layer analyzes those input values.The output layer collects the data from the hidden layer and decides the final output.The training procedure will be repeated until the difference between the neural network output and the supervisor comes within an acceptable range (Hameed et al., 2019).The mathematical formula for ANN can be stated as follows (Guo et al., 2021): where n denotes the number of input, w i , b indicates the weight and bias, respectively.I j is the input, and A n refers to the output of the ANN.

Experimental results
The experiments are carried out in google colaboratory (Colab) using python 3.0 with open source libraries like Tensorflow, Pandas, Numpy, and Keras.The experimental setup is based on a working environment having Intel(R) Core (TM) i5-7400 CPU @ 2.5 GHz with 4 GB RAM under 64-bit Windows 10 Pro Operating system.The entire dataset was analyzed before feeding to the model.Time series forecasting of the CO 2 emissions dataset is modeled using four deep learning models mentioned in the bellow section.The optimum results are considered from each model's eleven unique runs.The working and parameters optimization of all four versions are explored in sub-sections.

Data preprocessing
To forecast CO 2 emissions, the CO 2 data from the dataset has been transformed from metric tons per capita to kg per capita, which is previously mentioned.Electric energy consumption (kWh per capita) and GDP per capita (current US$) are the other two variables.The dataset only comprised 48 samples in yearly format, which is insufficient for a deep learning model to be appropriately trained.The Pandas library has a resample() function that can resample a series or a data frame.Downsampling is a technique for grouping observations and making the number of samples less than the original.In contrast, creating room for additional observations is known as up-sampling which can be accomplished using the resampling technique.We have changed our annual datasets into monthly datasets by invoking up-sampling and setting ''M'' as desired in the respective function, where M is denoted by month.The interpolation() method of the Pandas library is used to interpolate missing values, and there is a great range of simple and more complicated interpolation techniques available.However, we used the linear approach to fill the intermediate interpolated missing data, where a total of 565 data points were generated throughout the process.
• Missing values: As mentioned earlier Bangladesh-specific data was obtained from the WDI, and there were no missing figures.
• Scaling: The literature suggests that normalizing approaches substantially influence the performance of a model.As the data of raw time series ranges widely, the optimization methods used to derive the objective functions in some machine learning and deep learning models will not function properly unless the data is normalized.So that the normalization technique should be selected based on the problem and model at hand.Normalization is a technique for scaling (also called min-max scaling) numerical data that involves scaling each input variable separately to the range 0-1, the most precise range for floating-point data.Numerical input variables are normalized using MinMax scaler API to acceptable ranges for better performance.The data characteristics are given in Table 5 after the raw annual data has been transformed into monthly data using the up-sampling method.

Model configuration
As a consequence of over-fitting and under-fitting, the number of neuron layers in deep learning models affect predicting accuracy.In deep learning, it is not true that adding layers would enhance prediction accuracy (Hossain et al., 2021b).Due to overfitting, prediction models learn well during training but have greater mistakes during prediction.On the other hand, a limited number of layers might cause uncertainity related to poor fitting owing to the inability to map input-output relationships.The performance of a hybrid deep learning model must be improved by combining network layers in an appropriate manner (Hossain et al., 2021b).The configurations we have used are discussed bellow: 1. CNN-LSTM: One 1D convolutional layer, one LSTM layer, and a fully connected dense layer make up this model.

Parameters tuning
We have taken substantial amount of tuning, a very timeconsuming method, for all of the models because the performance of all of them are heavily dependent on parameter adjustment (Hossain et al., 2021b).Special efforts to calibrate the parameters of the CNN-LSTM and CNN forecasting models have taken because their structures are more complex than that of other deep learning models (Yao et al., 2021).All hyperparameters are manually adjusted through trial and error.Our models have some limitations as we could not able to employ optimization algorithms such as grid search and, random search.However, in the CNN model, the tunning for the kernel an essential task to design convolutional layers and is defined as an operator that transforms the information in the entire data set.The selection of kernel 3 has produced the best result out of eleven unique runs, followed by kernels 5, 6, and 7.Moreover, the best result are obtained while utilizing 64 filters.For the best outcomes, the number of hidden layers and neurons are also optimized for all deep learning models.In all models, a learning rate of 0.0001 provides a better level of accuracy.An early stopping approach has been utilized to minimize overfitting and optimize the efficiency of the models.

Forecasting evaluation metrics
The forecasting accuracy metric indicates how effectively a model predicts the future value.This endeavor strengthen its prediction powers that will benefit overall planning and make predictions far more agile in changing circumstances.Let us take a look at the most well-known forecast KPIs one by one.The data are separated into training and test sets.All models have been trained using training sets (80% of the data set), however, the RMSE, MAE, and MAPE values for the training and test sets have been evaluated and compared for each model.The accuracy metrics we have used are explained below: 1. MAPE: MAPE measures the accuracy of a forecasting system and is determined as the average absolute percent error for each period minus actual values divided by real values, and it is expressed as a percentage.It can be written as follows (Li et al., 2020): In Eq. ( 9) A(t) and F(t) indicate the actual and predicted values, respectively.The number of samples is denoted by the letter n.Fig. 6 illustrates the MAPE of all the deep learning models, where DNN has the lowest MAPE value of 3.853 and 3.678 for the test and training set, respectively.For the CNN model, the maximum error is recorded, and it is clearly shown that the testing errors are slightly larger than the training set errors for all models except DNN. 2. RMSE: RMSE is a common tool for analyzing the error of a model when predicting the quantitative data.The standard deviation of the residuals is denoted by RMSE.Residues are a measurement of how far data points depart from the regression line in regression analysis.It can be expressed as follows (Zhou and Chen, 2021): In Eq. ( 10) f (i) is the forecasted value and o(i) is the observed value.The other term, n, stands for the number of samples.Fig. 7 shows the RMSE for each model which we have used to estimate CO 2 emissions for both train and test data.According to the results, the DNN has the lowest RMSE value of 8.393 for the train data and 8.099 for the test data, respectively.CNN, on the other hand, has the largest error of any the projected model, which is 21.80 for the training set and 21.96 for the testing set.

MAE:
The MAE is a statistical error that evaluates the average magnitude of error in a set of forecasts without taking into account the direction of the projections.It is used to find out how accurate continuous variables are.In other words, over the verification sample, the MAE is the average of the absolute values of the differences between the forecast and the actual observation.It can be expressed as (Heydari et al., 2019): where y(i) denotes prediction value, x(i) denotes true value and n is the total number of data points.

Forecast accuracy metrics for multivariate polynomial regression
Simple polynomial regression is a form of regression that is applied to a single regressor, whereas multivariate polynomial regression involves multiple regressors.The later one is used to predict values when several variables are involved (Sinha, 2013).We have intended to develop polynomial regression after studying the CO 2 emissions curve, and the result seems excellent.Two steps are required to accomplish polynomial regression.Using the polynomial features function from sklearn, we first transform the inputs into a polynomial.Characteristics that are generated by exponentiating the existing features are known as polynomial features.This model has been tested optimistically from second to fifth order.However, it performs the best in the third order.It is worth mentioning that the complexity of the model rises as the degree of the polynomial increases.As a result, the order of polynomial must be chosen very carefully.After performing the transformation, the input features are extended into several new terms.Moreover, the problem formulation method is different from those deep learning models used in this study.Taking the first time step of GDP and energy consumption as input and the second time step of CO 2 emissions as the target variable, have made the system reliable to one step ahead forecasting.The general equation of third-order polynomial with two variables can be expressed as (Sinha, 2013): where, a 3,0 , a 2,1 , a 1,2 , a 0,3 , a 2,0 , a 1,1 , a 0,2 , a 1,0 , a 0,1 , a 0,0 ∈ R, as well a 3,0 , a 2,1 , a 1,2 , a 0,3 , a 2,0 , a 1,1 , and a 0,2 cannot be equal 0.
In our model, x stands for GDP, and y represents energy consumption.Subsequently the linear regression is used to fit the parameters after converting the data into polynomial features.The Fig. 9 portrays the pipeline that can be used to explain the complete procedure.   of 11.03, 7.86, and 5.54 in the same order.The errors are pretty similar to the LSTM model.Because of the limited amount of data points available, there have been some overfitting problems.As we have used the annual data to predict the following year, the training range for the model seems to be very narrow, resulting in slightly overfitting problems.It can be resolved by collecting more data in the future.

Validation losses
A validation error is a vehicle for assessing the generalization of a model that measures how well a model fits new data.Typically, the validation loss is more significant than the training loss, but this problem has overcome with the early stopping approach in our study.Consequently, the training and testing errors in the models have employed considerably close in magnitude.
Fig. 11 illustrates the validation losses which have been computed by the testing data set, which is used as a validation set.The box plot enables us to observe the distributional characteristics of validation losses of eleven unique runs.The CNN model has the lowest performance metric, followed by the LSTM,  demonstrates an outlier of the CNN-LSTM model, where the DNN model, on the other hand, outperforms the others, as the median line of this model lies at the bottom.

Forecasting the CO 2 emissions
This section shows the original and predicted CO 2 emissions results for all of the models in Fig. 12.After training the models, it is necessary to assess the performance to predict CO 2 emissions.Regarding this purpose the training data set has been utilized to measure the accuracy metrics displayed in the previous section.
The whole data set has been predicted in this section, allowing us to compare all the models in terms of actual and predicted graphs.According to Fig. 12(d), the CNN model shows the inferior result of all the models examined in this paper to predict CO 2 emissions.Fundamentally, CNN exhibits a wide range of hyperparameters and various specific architectures, which are often complex and provide difficulties in selecting the most appropriate value among many possible combinations.Aside from that, CNN is very sensitive to set its hyperparameters, which have a significant effect on the efficiency and behavior of its architecture.Fig. 12(d) demonstrates the projected value is far away from the actual value at the starting region, indicating poor result.
It was expected that the hybrid deep learning model would produce the best outcomes, it has not happened in this study.According to the data in the testing set, CNN-LSTM has slightly higher dispersion from its real value in the peak region, as shown in Fig. 12(b).To enhance the performance of the hybrid deep learning model, it is also necessary to fine-tune the hyperparameters, such as the number of layers, neurons, and so on.However, the CNN-LSTM hybrid model has substantially more complexity (Guo et al., 2020).To get a better result from this model, an experimental approach has been conducted to finetune its parameters.The best result has provided the DNN model and is much more generalized in terms of effectiveness which can be seen in Fig. 12(a).

Forecasting one step ahead
For univariate time series, a single step ahead forecast is easier than multivariate forecasting.Whenever a multivariate forecasting problem is encountered, it is complicated as the output series is completely dependent on one or more input time series that also need to be forecasted.We have performed our analysis using data from 1972 to 2019 and proceeded to estimate CO 2 emissions for the years 2020 and 2021.To forecast the CO 2 emissions in 2021, we must need the data from 2020 as input, but the actual electrical energy consumptions per capita data is not available on WDI.That is why we have to take assistance from other sources.However, the per capita electrical energy generation was 510 KWh in 2019, which we have considered as consumption per capita (Ebn Sharif, 2020).According to the problem formulation section, we can only forecast CO 2 emissions for the year 2020.As a result, forecasting for 2021 in the next phase, we have included the forecasted value of CO 2 emissions for 2020 in the original data set.As shown in Fig. 13, all the deep learning models except CNN predict a rising trend in CO 2 emissions.The DNN, LSTM, CNN, and CNN-LSTM predict CO 2 emissions of 665, 650.53, 575, and 612.50 kg per capita, respectively, for the year 2020, which was 598 kg per capita in 2019.
Despite the fact that both GDP and energy consumption increased from 2018 to 2019, according to the FMOLS results, GDP is less significant than energy consumption.This supports the increasing trend of CO 2 emissions as energy consumption has a positive impact on it.Later, the rising rate of GDP outpaces the increasing rate of energy consumption, resulting in reduction in the trend by 2021.For the year 2021, the models provide the forecasted values of 645, 668, 580, and 619 kg per capita for DNN, LSTM, CNN, and CNN-LSTM, respectively.The prediction of polynomial regression is quite close to LSTM, which forecasts the CO 2 emissions of 654 and 665.5 kg per capita in 2020 and 2021, respectively.
It is conceivable to expect a better outcome from these models if we could access to the actual data in some of the situations mentioned earlier.However, the outcomes are utterly reliable.Incorporating more explanatory factors into the model may lead to a more accurate estimation of CO 2 emissions.From the result, we may conclude that energy consumption is one of the vital factors for rising emissions.Switching to clean energy and proper formulation, implementation of environmental laws and policies should be targeted to minimize emissions in future (Rahman and Kashem, 2017).

Discussion
This long-term forecasting study has revealed the best outcomes with the assistance of the DNN technique.Compared to the other approaches, the DNN algorithm has shown better performance in terms of evaluation metrics accuracy.The effectiveness of this forecasting model has been assessed by comparing it to other deep learning models such as CNN, LSTM, and a hybrid model that combines CNN and LSTM as well as a multivariate polynomial regression.The MAPE, MAE, and RMSE are used to compare these models.The accuracy metrics of multivariate polynomial regression and LSTM models are more likely to be similar.For the years 2020 and 2021, the polynomial regression provides the predicted output that is nearly similar to the LSTM model, which is a very satisfactory result in a single step forward prediction.During this work, the road-maps of all models have been filled with information obtained throughout trial and error.Several problems occurred during this procedure, particularly in data processing and choosing the best forecasting model.This procedure is repeated until the models get the desired outcomes.It might be conceivable that the results of this analysis would have been more accurate if we had used the hyperparameter optimization technique.As a result, selecting the most appropriate set of parameters will be a significant concern of practitioners for future studies.However, the greenhouse effect is well known to have several severe consequences, including an increase in global pests and diseases, sea-level rise, a shift in temperature, and desertification.Nonetheless, Bangladesh should pay greater attention to CO 2 emissions, make necessary changes to climate and energy policies, and collaboratively address climate change problems, regardless of whether CO 2 emissions are increasing or decreasing.Even though Bangladesh is an overpopulated and developing nation, industrial output continues to increase in the country.These factors need to be considered if we want to forecast CO 2 emissions correctly in future.Many additional techniques, such as DOLS, ARDL, etc., may be used to derive empirical cointegration, long and short-run dynamics, and causal connections (Bastola and Sapkota, 2015;Rahman, 2017).Finally, to maintain the economy and public health over the time, Bangladesh must implement environmental protection laws.

Conclusion and policy implications
In this paper, the FMOLS approach has been used to analyze the impact of energy consumption and GDP on CO 2 emissions in Bangladesh from 1972 to 2019.The ADF and PP unit root tests are used to check the stationarity of the variables.The estimated outcomes reveal the evidence of cointegration and a long-term connection between GDP, energy consumption, and CO 2 emissions.As previously stated, energy consumption positively impacts CO 2 emissions, while GDP has a negative impact in Bangladesh.However, the nonlinear relationships between CO 2 emissions and their influencing factors are terribly complicated, making CO 2 forecasting a difficult job.A large number of researchers have made significant contributions to forecast CO 2 emissions.The accuracy of these approaches seem to be poor because of the complicated nonlinear relationship between CO 2 emissions and other factors, which is very difficult to model.Furthermore, the majority of these techniques do not have the capability of predicting CO 2 emissions levels.They can only anticipate it if they have obtained all the necessary variables in advance.The second section of this study explores the potential of employing the state-of-the-art of several deep learning techniques for forecasting CO 2 emissions in Bangladesh.In addition to that, a simple multivariate polynomial regression has also been conducted to forecast CO 2 emissions, which comprises a quite similar result with one of the deep learning models named LSTM.In future, this work will be extended to develop a multistep prediction model of CO 2 emissions that takes into account a wider range of variables at the same time.It is expected that by gathering a large amount of data, the forecasting outcome can be further improved.
It is undeniable that climate change is unfolding, and deep learning is playing an important role in this catastrophe.Fortunately, several recent research has begun in details the environmental costs of their revolutionary deep learning approaches, sometimes even including CO 2 emissions as an objective to be minimized.Considering Bangladesh is a densely populated country in the early stages of development, the study's findings indicate that energy consumption has an impact on CO 2 emissions.In this context, the immediate import and installation of energyefficient technologies may result in long-term CO 2 reduction.
Although the focus of this study is on Bangladesh, the findings can be used to provide a viable alternative for energy usage in the future, reducing CO 2 emissions globally.To reduce CO 2 emissions, Bangladesh must implement energy conservation and environmental protection policies for encouraging local and international investors in an environmental friendly resources.Additionally, the government should impose emission restrictions on enterprises and factories that release CO 2 particularly for the coal based power plants which are rising in Bangladesh.The country's energy policy should emphasize on research and investment in clean energy.In this aspect, technological advancement through research and development is essential despite the fact that carbon capture and storage (CCS) technology is an effective way to capture carbon but it is costly.Furthermore, the authors propose that readily available renewable resources with intelligent control can be a viable solution for effectively reducing CO 2 emissions.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
) where .denotes the matrix multiplication operation, b f , b i , b o , and b C are four bias vectors, the weight matrices U f , U i , U o , and U C connect the previous output to the three gates and the memory cell.The W f , W i , W o , and W C are the weight matrices, the σ g()

Fig. 8 ,
displays the MAE of all the developed models.The bar chart illustrates a gradual decrease in MAE values starting with CNN and ending with the DNN, which has the lowest error of 5.933 for the training set and 5.820 for the test set, respectively.The training and testing errors for the CNN model are 17.164 and 17.469, respectively.

Fig. 10 .
Fig. 10.RMSE, MAE, and MAPE values for multivariate polynomial regression model.Data are split into training and test sets according to 80% and 20% before applying linear regression.Both the training and test sets are used to assess the accuracy of the system.Fig.10shows the MAE, RMSE, and MAPE of polynomial regression.The bar chart depicts training errors of 9.58, 7.33, and 5.04 for RMSE, MAE, and MAPE, respectively.Testing data shows errors of 11.03, 7.86, and 5.54 in the same order.The errors are pretty similar to the LSTM model.Because of the limited amount of data points available, there have been some overfitting problems.As we have used the annual data to predict the following year, the training range for the model seems to be very narrow, resulting in slightly overfitting problems.It can be resolved by collecting more data in the future.

Table 1
Summary of recent studies.

Table 5
Data Features.
This model is a combination of one fully interconnected LSTM layer and one dense layer.The only difference between this model and CNN is the batch size, which is 8 in this case.This model performs best with batch sizes of 8 in this work.4. DNN: This model comprises three dense layers, having 32 neurons, 8 neurons, and one fully connected layer, respectively.All hidden layers use the relu activation function with batch size of 8, which outperforms all the models.
The root mean square loss function is used in conjunction with the adam optimizer.The batch size is 32, with a learning rate of 0.0001.2. CNN: This model comprises one 1D convolutional layer and a fully connected dense layer.In combination with the adam optimizer, the root mean square loss function is