Scenario analysis of transportation carbon emissions in China based on machine learning and deep neural network models

Coping with the relation between the increase in carbon emissions and energy consumption in the transportation sector is a pressing issue today. Machine learning and deep neural networks were used in this study to explore the influential factors and trends in future transportation carbon emissions. First, the least absolute shrinkage and selection operator (LASSO) regression was adopted to screen out the key influencing factors in transportation carbon emissions. Second, the prediction performance of the long short-term memory (LSTM) network, generalized regress neural network, and back propagation (BP) network were compared, and an improved LSTM optimized by the sparrow search algorithm was proposed. Third, LASSO-SSA-LSTM was used to predict the transportation sector’s future carbon emissions trends under different scenarios. The results suggested that transportation carbon emissions in China presented a trend of ‘rapid increase—fluctuating decrease—continuous increase’ from 2010 to 2019. Although the main determinant in curbing the rising rate of carbon emissions effectively is the continuous development of renewable energy technology, the variation in transportation carbon emissions in China under eight scenarios showed significant differences. Generally, systemic changes and innovations are crucial to accommodate China’s future low-carbon and sustainable transportation development.


Introduction
Continuous climate warming is causing irreversible losses to the global ecology, society, and economy [1], and the lock-in effect of the current high-carbon growth path will continue the warming trend in the future [2]. Climate Change 2022: Mitigation of Climate Change is the third part of the Intergovernmental Panel on Climate Change's Sixth Assessment Report (AR 6) and shows that greenhouse gas emissions over the last decade are at the highest levels in human history. The AR 6 indicates that immediate and large reductions of emissions across all sectors are necessary to limit global warming to below 1.5 • C. In 2019, direct greenhouse gas emissions from the transportation sector accounted for 23% of global energy-related carbon emissions [3]. Moreover, the transportation sector is one of the fastest-growing industries and highest energy consumers [4]. Reducing the transportation carbon emissions is challenging, and the transportation-related emissions have increased more rapidly in China than in Europe or North America [3]. As meeting climate mitigation goals would require transformative changes in the transportation sector, significant decarbonization in the sector is urgent and crucial. This suggests that more attention should be given to reducing transportation carbon emissions in China. Therefore, this study conducted an innovative data-driven analysis to investigate China's carbon emissions attributable to transportation. The goal is to propose novel statistical methods to inform low-carbon analyses and provide a basis for relevant policy recommendations.
The existing research on the subject of transportation carbon emissions has focused on predicting and analyzing its determining factors. On the whole, the determinants of transportation carbon emissions are related primarily to population, urbanization level, economic growth, GDP per capita, transportation structure, transportation energy intensity, energy consumption turnover, etc. At the same time, different regions have different degrees of influence and directions [5]. Further, economic and social development and citizens' improved income level are the main factors that affect the continuous increase in transportation carbon emissions. In addition, the urban population density and increased area of dense urban road networks promote the increase in traffic carbon emissions [6], as do the higher level of city public transportation, city planning, and transportation organization, all of which have had a considerable adverse effect [7,8]. Studies of provincial transportation carbon emissions have found that the advancement in urbanization, economic growth, infrastructure construction in large and medium-sized cities, and private car ownership increases carbon emissions significantly. However, while traffic energy efficiency can reduce carbon emissions effectively [9,10], most of the factors that affect transportation carbon emissions have not been selected in previous prediction models. Typically, these models are based upon optimization under rigorous assumptions and theoretical derivations. In contrast, the least absolute shrinkage and selection operator (LASSO) has the ability to screen for factors with only a slight effect and simplify the forecast models, which can solve the problems of multicollinearity and over-fitting.
Similarly, it is necessary to determine the regular pattern of carbon emissions to capture the trends in transportation carbon emissions. Scholars have provided abundant models to predict carbon emissions, including the scenario analysis prediction model, econometric model, and neural network model. The scenario analysis prediction model is used typically by designing different scenarios to determine the optimal carbon emissions reduction scenario. It was used here to assess the carbon reduction potential of China's transportation sector based upon the Long-range Energy Alternatives Planning (LEAP) system, and then simulation projections and scenario analyses were conducted for distinct scenarios, respectively [11,12]. The stochastic impacts regression on population, affluence, and technology (STIRPAT) model has been used extensively to forecast carbon emissions [13]. Based upon the integrated assessment models (IAMs), it was found that the peak date and optimal reduction pathway of carbon emissions are influenced by variations in the model's structure, technology assumptions, energy consumption, economic growth, and policy measures [14,15]. The econometric and neural network models are also used widely in forecast research. Based upon the STIRPAT model, the vector auto regressive model was adopted to predict emissions in 2030 [16].
An optimized generalized regress neural network (GRNN) model and an improved extreme learning machine (IELM) model have been proposed to predict carbon emission intensity and influencing factors in China [17,18]. With the increasing requirements for forecast accuracy and technology advancement, artificial intelligence (AI) algorithms have been used more widely to predict carbon emissions because of their excellent learning ability, strong nonlinear fitting performance, and robustness [19]. While most of the carbon emissions forecasting methods use LEAP and STIRPAT systems and convert them into linear models, or use a single AI algorithm, the long shortterm memory (LSTM) forecast model optimized by the sparrow search algorithm (SSA) constructed in this study can reflect the complex non-linear relations in the data better and achieve improved forecast performance.
In summary, scholars have discussed the determinants of transportation carbon emissions, a single or hybrid model's performance, and forecast results under multiple scenarios thoroughly. These studies have provided a profound reference for further study to pursue more accurate forecasts with complex interaction characteristics. Based upon the previous literature analysis, this research offers two crucial contributions: First, it explored a machine learning approach to screen factors related to transportation carbon emissions. Then, the LASSO model was applied to optimize over-fitting and the variables' multiple co-linearity to achieve a better ability to generalize the results and efficient prediction.
Next, an innovative model to predict carbon emissions is proposed based upon SSA-LSTM. Deep neural networks and machine learning models have been developed to manage the carbon peak problem, which provides a new reference for studies on the carbon emissions issue. SSA was used to optimize the LSTM neural network to improve the forecast performance. In addition, a scenario analysis was applied to determine the way the growth rates in influencing variables affect the degree of transportation carbon emissions. The model can predict China's transportation carbon emissions more accurately, and its scenario analyses are significant, as they assist better decision-making in reducing transportation carbon emissions.

The accounting of transport carbon emission
The measurement method of transport carbon emissions is divided into the 'top-down' form and the 'bottom-up' process analysis method. Since the bottom-up calculation considers the mileage of different transportation types and the energy consumption per unit of driving data, these data values are not available in China's existing statistical system. Therefore, the research adopted the top-down method to measure the carbon emissions of transport energy consumption for [20]: where E i is the physical consumption of class I transportation energy, K i is the standard coal reference coefficient of class I energy conversion, and F i is the carbon emission coefficient of Class I energy (Specific coefficient is shown in table 1).

LASSO regression
LASSO has been widely employed for variable selection and complexity regularization, which can effectively avoid over-fitting and improve prediction efficiency [21]. A more refined model can be obtained by constructing a penalty function and setting some regression coefficients to zero. It can be written by where n is the number of observations, p is the dimension of variables, β represents the regression coefficients, and λ is a tuning parameter that controls the compression of regression [22,23].

LSTM neural network
LSTM neural network is proposed by Hochreiter and Schmidhuber [24]. The structure of LSTM is displayed in figure 1, and the LSTM is consisted of forget gate unit, input gate unit, and output gate unit. They control the weight of the historical information severally. The expression is shown in equations (3)- (8).
where i t , f i stand for the input gate, forget gate, andC t stands for candidate state memory unit at time step t, respectively. W i , W f , and W o , are the weight of input gate, forget gate and output gate, and W c is the weight of candidate state memory unit, severally. b i , b f , and b o are the deviation of the input gate, forget gate, and output gate, while b c stands for the deviation of the candidate state memory unit. σ denotes the sigmoid activation function [25].

Procedure of SSA-LSTM
The SSA is a novel swarm intelligence optimization method inspired by the foraging behavior of sparrows. It has been widely used in engineering and economic problems with the advantages of easy understanding, high convergence rate, and requiring no gradient information [26]. The parameter of LSTM network structure and learning progress has a significant impact on the prediction accuracy of transport carbon emissions. Hence, it is necessary to optimize the critical parameters of LSTM. In this study, SSA was used to optimize the critical parameters, such as the number of hidden layers, the initial learning rate and the L2 regularization coefficient.
The optimization procedure is as follows: (1) Initialization: The training and testing data are imported and normalized. The parameters of SSA, such as the number of search agents, the maximum iterations, and the lower and upper bounds, are initialized at this stage.

Evaluation criteria
To evaluate the prediction performance of the models, the root means square error (RMSE), mean absolute error (MAE), and goodness-of-fit R-SQUARE (R 2 ) are used in this paper. The above criteria can be calculated by where n is the number of predicted carbon emission. y i denotes the actual value of carbon emission, y i denotes the predicted value of carbon emission and y denotes the average value of carbon emission. The prediction error is greater when the values of RMSE and MAE are larger.

Research flow
The process of this research is displayed in figure 2, which includes three parts: First, this study analyzes the influencing factors of transportation carbon emission have been proposed in the existing literature, applying LASSO machine learning to screen out the key variables, which are used as the extended STIRPAT model. Second, a hybrid forecasting model of LSTM neural network optimized by SSA algorithm is constructed. Moreover, it is compared the goodness of fit and prediction performance among LSTM, GRNN and BP neural network. It is noteworthy that SSA-LSTM obtains the fastest speed and best accuracy. Last, based on different economic and social environments, eight different scenarios are introduced. SSA-LSTM model with best prediction performance is used to forecast the transportation carbon emission trend in China, and judge the reality pathway for carbon emission reduction in transportation sector.

Lasso regression result
Provincial panel data from 2007 to 2019 in China were collected. However, because of the low availability and integrity of data from Xizang, Macao, Hong Kong, and Taiwan, only 30 remaining provinces were studied. Therefore, the data in this paper were obtained from the China Transportation Statistical Yearbook, China Energy Statistical Yearbook and China Statistical Yearbook. Many factors influence transport sector carbon emissions. By using the extended STIRPAT model, this paper summarized 15 potential indicators that influence transportation carbon emissions: Passenger turnover (×1); freight turnover (×2); total population at the end of year (×3); urban population (×4); total GDP (×5); tertiary industry value (×6); total energy consumption (×7); proportion of natural gas and electricity (×8); GDP per capita (×9); urbanization rate (×10); contribution ratio of tertiary industry to GDP (×11); fiscal expenditure on transportation (×12); E-commerce transaction volume (×13); renewable energy (×14) and thermal power generation (×15).
However, if all 15 factors are taken into consideration as inputs into the forecast model, the model will be too complex and lead to multicollinearity. Therefore, this paper adopted the LASSO regression method to screen out the most important factors related to transportation carbon emissions. As the parameter lambda value increases, the number of independent variables used in LASSO regression decreases, and the model's prediction error increases. However, the LASSO model obtains the parameter lambda that minimizes the model's mean square error, and the result was lambda.min = 0.0001. After the optimal parameter lambda was obtained, the new LASSO regression model was retrained using all training datasets, and finally, seven variables were selected (figure 3): Passenger turnover (×1); freight turnover (×2); total GDP (×5); total energy consumption (×7); proportion of natural gas and electricity (×8); GDP per capita (×9) and renewable energy (×14), and they were considered as main factors that affect transportation carbon emissions in this study.

Model comparative analysis
The input features of the model training data set were the following provincial data of China from 2007 to 2016: Population factors (passenger turnover and GDP per capita), economic factors (total GDP and freight turnover), and technology factors (proportion of natural gas and electricity, total energy consumption, and renewable energy), and the output data were the transportation carbon emission values for each province over the period of 2007-2016, while the testing data set for the models was the data values above for each region between 2017 and 2019.

Comparison of prediction accuracy
As displayed in figure 4 and table 2, the LSTM model had better prediction performance for transportation carbon emissions than the traditional neural network BP and GRNN models. Therefore, in this section, the LSTM model's prediction accuracy is compared with that of the BP and GRNN models.
The qualitative analysis was performed by combining the sample testing data set of the BP and GRNN models, and the LSTM models. The prediction results of the different models can be compared and analyzed visually in figure 4. Generally, it can be seen that the prediction accuracy of the LSTM model's testing data set is higher than that of BP and GRNN. As a result, the predicted values for testing the data set fit the actual values better. It can be observed from table 2 that the RMSE and MAE values of LSTM are the smallest, while the R 2 value of LSTM is the largest among the three models.  Thus, the qualitative and quantitative analyses above demonstrated that the prediction performance of the LSTM model is greater than that of BP and GRNN.

Comparison of LSTM and SSA-LSTM models
The analysis was conducted for the indicators mentioned above to verify the SSA optimization algorithm's effect in improving the LSTM's forecasting ability. Figure 5 displays the transportation carbon emission values of 90 provincial samples during 2017-2019 that the SSA-LSTM model predicted, which fit the actual transportation carbon emissions data better. Table 3 shows the comparison of the prediction results. The SSA-LSTM model's predicted R 2 reached 99.05%. In contrast, the original LSTM algorithm model's predicted R 2 was 98.17%, which represents an improvement of 0.896%. The R 2 of the prediction model showed that the goodness of fit of the SSA-LSTM model was better overall, but did not show a significant deviation. The MAE of SSA-LSTM decreased from 0.0217 to 0.0133, which is 38.7% lower, and the RMSE decreased from 0.0312 to 0.0224, which is 28.21% lower, indicating that the optimized algorithm model's prediction accuracy improved significantly.
As can be seen, the SSA-optimized LSTM neural network model proposed in this research portrayed the characteristics of transportation carbon emissions in China more accurately, and thus improved the prediction performance of the transportation carbon emissions prediction model and made more accurate predictions of the 2030 transportation carbon emissions peak.

Scenario analysis
A sustainable transportation system provides safe, inclusive, affordable, and clean passenger and freight mobility for current and future generations [27]. While much attention has been given to electric vehicles and renewable energy technologies to reduce transportation emissions, the population dynamics, culture and economic systems, urbanization level, and policy regulations also affect emissions from the transportation sector. Thus, systemic changes and innovations in these components allow transportation emissions to be decoupled without loss of economic activity [28][29][30]. Therefore, scenario analysis is generally explored to forecast the trend of carbon emissions. The actual environment that China faces and the potential scenarios of future transportation carbon emissions were considered to make the assumptions of the economic and social development scenarios for China during 2022-2036 (Shown in  table 4). Based upon the extended STIRPAT model, population, economic, and technology factors are estimated as determinants of transportation carbon emissions, this paper referred to the growth rates of China's population factors (passenger turnover and GDP per capita), economic factors (total GDP and freight turnover), and technology factors (proportion of natural gas and electricity, total energy consumption, and renewable energy) from 2010 to 2019, combined with previous scholars' research and relevant policy planning to set the annual low and   table 5). The transportation carbon emission determinants were divided into high and low rates of change to reflect the future development trends under different scenarios, as shown in table 6.

Population factor
The population is one of the critical factors that promote transportation carbon emissions, and two main variables were considered: passenger turnover and GDP per capita. In this study, with respect to the actual growth in passenger traffic turnover and GDP per capita in China's economy and society during 2010-2019, the low and high rate of passenger turnover in the forecast interval of 2022-2036 was set to −0.041 and 0.109, respectively, while the low and high rate of GDP per capita was set to 0.065 and 0.109.

Economic factor
The total GDP and freight turnover influence transportation carbon emissions. According to the World Bank forecast scenario, the economic growth rate will show a long-term declining trend over time and then remain low [36]. Thus, with respect to the total GDP of the Chinese economy and actual growth of freight turnover during 2010-2019, the low and high rate of real GDP was set to 0.043 and 0.078, respectively, and the low and high rate of freight turnover was set to −0.078 and 0.138, in the forecast interval of 2022-2036.

Technology factor
China must focus on curbing the increase in carbon emissions in traffic by improving the level of transportation emissions reduction technology, largely considering the effect of the proportion of natural gas and electricity, total energy consumption, renewable energy, etc. The increase in energy demand will continue to slow in the future, while the process of diversification in the transportation energy structure has accelerated significantly. Moreover, the technology factor refers to the China Energy Outlook 2030 views [37], combined with the growth pattern in China's economy during 2010-2019, and thus the low and high rate of energy consumption in the forecast interval of 2022-2036 was set to −0.012 and 0.093, respectively, the low and high rate of the proportion of natural gas and electricity was set to 0.137 and 0.015, renewable energy was set to 0.12 and 0.003, respectively. Figure 6 presented a 'rapid increase-fluctuating decrease-continuous increase' trend in China from 2010 to 2019. From 2010 to 2012, transportation carbon emissions showed a rapid increase, while from 2012 to 2019, they showed a fluctuating decrease and then continuous growth ( figure 6). Further, the High rate 0.12   The increase in emissions under scenario 2 was significantly higher than in the other scenarios, with a high population factor growth and increasing transportation demand. Meanwhile, the slow technological change led to the inadequate adjustment of energy structure and a slight improvement in energy efficiency overall. According to the forecast analysis, the goal of peaking carbon emissions from transportation by 2030 cannot be achieved in the 3rd, 5th, and 8th scenarios due to low technological level, and in the 7th scenario due to rapid economic growth. The increase in transportation carbon emissions in scenario 1 is considerably lower than in the other scenarios, because of the stable economic growth, slowly increasing population, high energy efficiency, improved technological innovation, and optimal energy structure. Under scenario 4, population factors and economic levels are expected to increase. Although the carbon emissions reduction technology is developing rapidly, transportation carbon emissions showed a trend of 'steady growth-fluctuating decline-rebound growth,' with the rebound in 2034. Scenario 6 showed a fluctuating downward trend in transportation carbon emissions. In this scenario, the population factor is expected to increase at a rapid rate, the economy maintains slow growth, the level of carbon emissions technology develops rapidly, and the energy efficiency overall and renewable energy technology increase, which is an ideal state, but not easy to achieve. The sustainable development of China's economy needs to balance economic development and energy conservation with emissions reduction, as economic development and population growth will lead to an increase in carbon emissions. Still, these two factors are difficult to adjust as the target, so technological progress should be used as a breakthrough to curb carbon emissions in transportation and contribute to the dual transportation carbon goals of carbon peaking and carbon neutrality in China.

Discussions
This study was conducted to predict the high and low rates of change of multiple factors in China to reflect the effect of future transportation development trends under eight different scenarios in which the changes in transportation carbon emissions differed significantly. Most of the predicted scenarios showed that it will be difficult to achieve peak transportation carbon emissions in 2030 in China, because of the slow technological development under the 2nd, 3rd, 5th, and 8th scenarios, and the rapid economic growth in the 4th and 7th scenarios. It is notable that the importance of high-quality economic development and populationrelated transportation demands should be considered as well as the high technology. Thus, systemic changes and innovative approaches are crucial to the Chinese government's ability to decarbonize.
Let us take the perspective of systemic changes first. Population factors, economic structure, and technology development are related highly to carbon emissions in the transportation sector. As China's urbanization continues, the expansion of urban areas, residents' increasing standard of living, and the increased transportation distance between residential areas and workplaces will generate more demand for passenger and freight transportation. Thus, a larger population and higher economic level increase transportation carbon emissions [39]. In contrast, economic growth is a prerequisite to meeting sustainable development, and reducing transportation carbon emissions cannot be achieved by slowing economic growth. To balance economic development and transportation carbon emissions, the main intervening factor to curb carbon emissions effectively is the continuous development of clean energy technology [40]. Meanwhile, sufficient attention should be paid to the population dynamics, culture, economic systems, urbanization level, and policy regulations related to the transportation sector.
Second, with respect to the transportation supply side, given that population size and economic level are two factors that are difficult to adjust, mitigating the increase in carbon emissions should be focused on by improving transportation emission reduction technologies in the future. Therefore, environmental regulation policies should be considered to control the waste and pollution attributable to energy consumption [41], and industrial restructuring needs to be facilitated to promote the digital transformation of transportation [42,43]. As a result, the share of renewable energy could be increased in transportation, and the development of renewable energy vehicles should be accelerated.
Third, with respect to the transportation demand side, it is necessary to reduce transportation demand to achieve very low-carbon scenarios through a combination of cultural change and a low-carbon lifestyle [31,32]. A move to a digital economy that allows workers to work and access information remotely could reduce travel demand significantly. Case studies have suggested that teleworking could reduce transportation emissions by 20% in some instances, but likely by 1% at most across the entire transportation system [44,45]. Information and communication technology (ICT) and smart city construction can be adopted as well to improve the transportation sector's efficiency. Further, digital technologies can increase the convenience of transit and active transportation over private cars used by simplifying mobility options to reduce unnecessary travel by personal vehicle.

Conclusions
A novel method was illustrated in this paper to screen the factors related to transportation carbon emissions. This study could be a potential reference for other low-carbon research under the background of double-carbon target. The main findings of this study were: First, the results showed that the determining factors selected by using LASSO were reasonable in this research. The LASSO model is influential in medicine, psychology, and neuroscience [46][47][48][49], and is also instrumental in economics. The LASSO-SSA-LSTM model combined with machine learning and deep neural network offered better prediction accuracy and goodness of fit compared with traditional neural networks, i.e. GRNN, BP, etc, which can improve the prediction performance in the field of economic research significantly and can be applied in other macroeconomic areas.
Second, by analyzing the variables that affect transportation carbon emissions, seven key influencing factors were identified in this study, including population, economic, and technology factors. Among them, the increase in passenger turnover, GDP per capita, total GDP, freight turnover, and energy consumption increased transportation carbon emissions. However, the proportion of natural gas and electricity and renewable energy play a suppressing role in carbon emissions. ' Third, the LASSO-SSA-LSTM model was used to forecast China's transportation carbon emissions from 2022 to 2036, and the results suggested that transportation is one of the most difficult sectors to decarbonize [50]. The scenario with low population growth, steady economic growth, and high growth in transportation emissions reduction technology is likely to be the most sustainable ways to develop transportation in China.
Finally, it is noteworthy that this study has certain limitations and needs to be improved in future research. The digital technologies, such as ICT, big data and AI, would play an important role on transportation carbon emissions reduction in the future, but the impact has not been fully considered in this study due to the lack of data, and further research could be expanded in this field. Furthermore, the national results offer limited insights into regional situation, since China's regional development and energy structures differ significantly. Regional transportation carbon emissions forecast could be further investigated by using the proposed LASSO-SSA-LSTM model to give more support to the government and policy makers in the future.

Data availability statement
The data that support the findings of this study are available upon reasonable request from the authors.