Accounting CO 2 Emissions of the Cement Industry: Based on an Electricity–Carbon Coupling Analysis

: Since the cement industry acts as a signiﬁcant contributor to carbon emissions in China, China’s national emission trading system has announced that it should be included in the system soon. However, current cement carbon accounting methods require high-resolution data from various processes on the production line, making it a cumbersome and costly process. To address this issue, this study explores the feasibility and reliability of using machine learning algorithms to develop electricity–carbon models. These models estimate carbon emissions based solely on electricity data, enabling faster and more cost-effective accounting of carbon in cement production. This study investigates the correlations between electricity data and carbon emissions for a large cement manufacturer in southern China. It compares the performance of models based on the supply of electricity (purchased electricity and waste heat electricity) with those based on the consumption of electricity (electricity used on the grinding machines in the production lines) to identify the key factor for carbon emission calculations. The identiﬁed best performing model showed high accuracy, with an R 2 of 0.96, an RMSPE of 3.88%, and a MAPE of 2.56%. Based on this, the novel electricity–carbon model has the potential to act as one of the optional methods for carbon emissions accounting in the cement industry and to support carbon emissions data promotion within China’s national emission trading systems.


Introduction
More than 70 countries have pledged to achieve net-zero under the Paris Agreement [1]. Carbon pricing, including instruments like carbon taxes, carbon emissions trading systems (ETSs), carbon crediting mechanisms and results-based climate finance, is a powerful tool to support national/regional plans for decarbonization in the economy [2]. Currently, there are 68 carbon pricing instruments in place, involving 34 ETSs and 37 carbon tax regimes and covering 23% of global carbon emissions [3]. After an 8-year pilot trial of seven regional ETSs, China established its first national ETS in July 2021 and became the world's largest carbon market by emissions, primarily involving the power generation industry [1, 2,4]. During the first complete compliance cycle of China's national ETS, there were more than 2200 power enterprises that participated, with a compliance rate of 99.5% [5,6]. Based on successful trials, China's national ETS is considering gradually including more emissionintensive enterprises in seven major industrial sectors: petrochemical, chemical, building material (specifically cement clinker and plate glass production), iron and steel, non-ferrous metal, paper making, and domestic aviation sectors [5]. However, the national ETS is still in its initial stage and faces various challenges, such as limited market scales, an immature trading system, and an unsound legal system [7]. As China's national ETS continues to develop, the mechanism for cap setting, permit allocation, MRV (Monitoring, Reporting, Verification), trading systems, and implementation need to be improved [8]. data collection. This method offers the advantage of relying on statistically reliable data as opposed to scattered and variable point-source measurements. Electricity consumption data is generally considered to be highly dependable and is closely linked to both cement production and total energy consumption. In a typical cement line, electricity consumption is dominated by three core processes, namely cement grinding (43%), calcination (33%), and raw material preparation (24%) [27]. The links among electricity consumption, coal consumption, production processes and carbon emissions are concluded as following, in Figure 1. With the aim of reducing the complexity and time requirements involved in collecting information on the cement production line for carbon emission accounting, while also increasing the accuracy of carbon emission accounting in the cement industry to further support China's national ETS, a novel electricity-carbon model was proposed in this study, where carbon emissions will be predicted solely by electricity data. A case study on a cement company located in southern China was conducted, assessing its historical electricity consumption and carbon emissions record over 2016, with data collected on a daily basis. The electricity data was grouped by their functions as supply data (Group 1, purchased electricity and waste heat electricity) and consumption data (Group 2, electricity usage on machines). Several machine learning algorithms were used for constructing the electricity-carbon model and validating the feasibility and reliability of this novel model.

Literature Review
The cement industry's carbon emissions can be predicted quantitatively using various mathematical and machine learning (ML) models (Table 1). However, these models require multiple input factors, which can be challenging to obtain, leading to uncertainty in the data. To address this, some researchers have applied strict conditions or strengthened model restrictions in order to reduce the uncertainty [21,[28][29][30][31][32][33][34][35][36]. Early attempts have been made with the Long-range Energy Alternatives Planning System (LEAP) model and improved with expert judgement [28,33]. Some other effective models include the Stochastic Impacts by Regression on Population, Affluence, and Technology (STIRPAT) model [34], the Back Propagation Neural Network (BPNN) [9,29], the Integrated MARKAL-EFOM System (TIMES) model for industry-level prediction [21,35], the Particle With the aim of reducing the complexity and time requirements involved in collecting information on the cement production line for carbon emission accounting, while also increasing the accuracy of carbon emission accounting in the cement industry to further support China's national ETS, a novel electricity-carbon model was proposed in this study, where carbon emissions will be predicted solely by electricity data. A case study on a cement company located in southern China was conducted, assessing its historical electricity consumption and carbon emissions record over 2016, with data collected on a daily basis. The electricity data was grouped by their functions as supply data (Group 1, purchased electricity and waste heat electricity) and consumption data (Group 2, electricity usage on machines). Several machine learning algorithms were used for constructing the electricity-carbon model and validating the feasibility and reliability of this novel model.

Literature Review
The cement industry's carbon emissions can be predicted quantitatively using various mathematical and machine learning (ML) models (Table 1). However, these models require multiple input factors, which can be challenging to obtain, leading to uncertainty in the data. To address this, some researchers have applied strict conditions or strengthened model restrictions in order to reduce the uncertainty [21,[28][29][30][31][32][33][34][35][36]. Early attempts have been made with the Long-range Energy Alternatives Planning System (LEAP) model and improved with expert judgement [28,33]. Some other effective models include the Stochastic Impacts by Regression on Population, Affluence, and Technology (STIRPAT) model [34], the Back Propagation Neural Network (BPNN) [9,29], the Integrated MARKAL-EFOM System (TIMES) model for industry-level prediction [21,35], the Particle Swarm Optimization (PSO) algorithm [30,31], the system dynamic model [37], and the Verhulst Grey forecasting (V-GM) model for country-scale prediction [36]. The mentioned studies show relatively high accuracy in their results by promoting the models. However, few attempted to resolve the data uncertainty issues by narrowing the factors list, though most of them pointed out that the determined factors related to energy efficiency and production.  [37] Indonesian cement industry Multi: socio-economic, production, energy, technology Determined factor: cement production Other studies have highlighted that there is a correlation between total electricity consumption and CO 2 emissions [38][39][40][41][42]. This correlation has been investigated in several studies on electricity consumption (some highlighted renewable energy intensity) of national and/or regional scales, for example, in GCC countries [38], ASEAN countries [39], China [40], Canada [41], and Ghana [42]. Although electricity consumption has been recognized as one of the key factors determining the overall carbon emissions of the cement production line [43,44], it accounts for a relatively moderate proportion of emissions [4] and has never been tested as a sole indicator for total emissions on an industry level.
In recent studies, ML has been tested and shown to be a powerful tool for predicting carbon emissions in the cement industry. Furthermore, it has also been widely applied throughout various industries, e.g., transportation [45], construction [46], and power generation [47]. Therefore, using ML to predict cement industry CO 2 emissions with electricity consumption data is a promising approach. However, despite their accuracy, these previous models still require extensive data collection efforts. Thus, this study is a pioneering effort in integrating the sole indicator of electricity data and the ML method into carbon emissions estimation on a factory scale.

Case Study and Data Sources
This research focuses on a cement manufacturer located in southern China, with a daily production capacity of 22,000 t and four cement production lines using second-generation dry technology. The product line inventories and electricity consumptions were recorded and reported by the manufacturer on a daily basis from the 2 January to the 31 December 2016. The raw data includes the quantity and quality of raw materials and clinkers, coal consumption in rotary kilns, the electricity consumption of raw material mills, clinker mills, and cement mills, as well as the amount of electricity generated from wasted heat and purchased from external sources.

Carbon Emission Accounting
Currently, the official methodology of carbon emission accounting in the cement industry is an EFA method given by The Guidance for Cement Industry Greenhouse Emissions Monitoring and Reporting in China (published by China's National Development and Reform Commission), which is briefly summarized below: In which, C total is the total carbon emissions for a typical cement production cycle, consisting of two major components: the emissions from energy consumption (C energy , Equation (2)) and the emissions released from the manufacturing process (C process , Equation (3)). The calculation of C energy is delineated in Equation (2), where emission factors (EF i,j ) have been assigned to each of the emission activities (AD i,j ). This study mainly considers the emissions from on-site fossil fuel combustion (AD i ) and purchased electricity generated from non-renewable sources (AD j *). In practice, the manufacturing plants are also supported by the electricity recycled from waste heat during calcination processes and the electricity generated from renewable sources, both of which are considered zero-emission sources and are excluded in this calculation. The process emission (C process ) is mainly derived from the calcination process and is quantified by carbonate Ca and Mg in the ore materials [48]. In Equation (3), Q ck represents the amount of cement clinker production and FR * CaO and FR * MgO represent CaO and MgO from the carbonate component in the clinker, respectively, which is calculated from the mineral composition of raw materials. The total emission (TC), as the sum of four production lines, was calculated for each observed day ( Table 2).

Electricity Data
Total electricity consumption can be determined from either the supply end or the consumption end. Therefore, five types of electricity consumption data have been identified as the representative factors for determining total carbon emissions and could be categorized as either supply factors (Group 1) or consumption factors (Group 2). In this case, the supply factors (Group 1) include both net purchase power (NPP) and waste heat power generation (WHG). The consumption factors (Group 2) consist of cement grinding electricity consumption (CG), clinker production electricity consumption (CP), and other production process electricity consumption (OP). The relationship between these factors can be summarized as follows: Total Electricity Supply/Consumption = NPP + WHG = CG + CP + OP The electricity data of supply and consumption were also calculated as the daily sum of all the production lines over the observation period and were compared against the corresponding total emissions ( Table 2). The electricity-carbon models were then constructed for both Group 1 vs. TC and Group 2 vs. TC, which aimed to compare the model dependencies on supply and consumption data.

Models
In this study, the electricity-carbon models were constructed through six different ML algorithms which used Group 1 (supply) or Group 2 (consumption) electricity factors to predict the total carbon emissions of a cement production site. The chosen algorithms were Energies 2023, 16, 4453 6 of 13 linear regression (LR), polynomial regression (PR), the artificial neural network (ANN), least absolute shrinkage and selection operator regression (Lasso), ridge regression (Ridge), and the k-nearest neighbor classification (kNN), which have been recognized as the most widely applied algorithms in scenario estimations. The constraints were tuned to maximize the performance of each model with regards to the data size (Table 3). The 7:3 ratio of training:test was chosen based on the simultaneous distribution principle, which allows a similar allocation frequency for the values of the target variable. With these constrains, each machine learning model was trained with a random selection of 70% of the whole dataset of electricity and carbon emissions. Then, the model was applied to the remaining 30% of the electricity data to calculate the predicted carbon emissions, which could be compared with the observed total carbon emissions (TC). In this study, the electricity factors for Group 1 and Group 2 were evaluated separately; the models were independently set up and tested for each group against the total emission.

Evaluation Indices
The performance of machine learning algorithms is commonly evaluated by comparing the difference between the predicted value and the observed value. Commonly used evaluation indices include the Root Mean Square Percentage Error (RMSPE), Mean Absolute Percentage Error (MAPE), and coefficient of determination (R 2 ) [49][50][51]. Thus, this paper used these three statistical indicators to evaluate and analyze the fitting results of the regression models.
MAPE is a measure of prediction accuracy that calculates the mean of the absolute value of the residuals between the predicted and true values. RMSPE measures the deviation between the predicted and true values and is more sensitive to outliers in the data. R 2 reflects the degree of fit of the model to the sample data. The larger the value, the better the fit. In general, the closer the RMSPE and the MAPE are to 0, the higher the prediction accuracy, whereas the closer the R 2 is to 1, the better the fit is. The equations for these measures are as follows: where n is the number of observed samples, f i is the value of the target variable predicted by the regression model, y i is the true value of the target variable, y is the average value of the observed samples, ∑ n i=1 ( f i − y i ) 2 is the error generated by the prediction, and ∑ n i=1 (y − y i ) 2 is the error generated by the average value.

Cross Validation
In order to decrease the uncertainty in the dataset and test the robustness of the model, we also conducted a k-fold cross validation test. The k-fold cross validation is one of the most popular cross validation methods [52]. It tests the validation of the models by clustering the dataset randomly and equally into k folds. Then, it was applied to the models using one-fold as the test set and the remaining k − 1 folds were combined as the training set. The variance observed from the validation test shows whether the model could be considered valid. In the study, we set k to be 10.

Contribution of Electricity Consumption and Carbon Emissions
The emission from electricity appears to be a minor contributor to total carbon emissions in cement production, accounting for just 3.47% at our studied site. A total of 41.46% of the electricity is supplied by the waste heat recycling system, which does not contribute to any emissions.
There are two key processes in the cement production line: clinker production and cement grinding. These are responsible for the majority of the electricity usage (48.67% and 39.22%, respectively) and carbon emissions (98.24% and 1.45%, respectively). Details of the distributions for electricity consumption and carbon emissions from the key processes are showing as following, in Figure 2.

Comparison between Machine Learning Algorithms
The results of the electricity-carbon models applied with six ML algorithms individually are shown below in Tables 4 and 5, where the values were calculated according to Equations (4)- (6). Table 4 shows the results from Group 1. To sum up, the determinant coefficient (R 2 ) performs well in each model. The values from the validation set are consistent with the values from the test set, which demonstrates the robustness of the results from the test set. There were no significant differences among the R 2 values from the test set, which all ranged from 0.9300 to 0.9620. The kNN model differed most from the others and also showed the biggest error rate, with an RMSPE of 17.94% and a MAPE of 5.81%. When comparing the MAPE and RMSPE, the PR model had the lowest error rate, with a MAPE value of 2.56% and an RMSPE value of 3.88%, whereas the other models had a slightly higher error rate. In terms of the MAPE, the models we selected in this study all performed well; the highest value was 5.82%. As for the RMSPE, the value range was marginally larger than the other two indices; the largest value reached 17.94% (from the kNN model) and the smallest value was 3.88% (from the PR model). Generally, the R 2 of the training set was always higher than that of the test set, and the RMSPE was also shown to be much larger than the MAPE. The results from the validation set and their variance show consistency between the training set, test set, and validation set.

Comparison between Machine Learning Algorithms
The results of the electricity-carbon models applied with six ML algorithms individually are shown below in Tables 4 and 5, where the values were calculated according to Equations (4)- (6). Table 4 shows the results from Group 1. To sum up, the determinant coefficient (R 2 ) performs well in each model. The values from the validation set are consistent with the values from the test set, which demonstrates the robustness of the results from the test set. There were no significant differences among the R 2 values from the test set, which all ranged from 0.9300 to 0.9620. The kNN model differed most from the others and also showed the biggest error rate, with an RMSPE of 17.94% and a MAPE of 5.81%. When comparing the MAPE and RMSPE, the PR model had the lowest error rate, with a MAPE value of 2.56% and an RMSPE value of 3.88%, whereas the other models had a slightly higher error rate. In terms of the MAPE, the models we selected in this study all performed well; the highest value was 5.82%. As for the RMSPE, the value range was marginally larger than the other two indices; the largest value reached 17.94% (from the kNN model) and the smallest value was 3.88% (from the PR model). Generally, the R 2 of the training set was always higher than that of the test set, and the RMSPE was also shown to be much larger than the MAPE. The results from the validation set and their variance show consistency between the training set, test set, and validation set.  For the results from Group 2, the general performance appeared slightly worse than the results from Group 1. Table 5 shows the details. In comparison, the values of the error rate from the ANN model were the smallest, whereas the value of R 2 performed only better than that of the kNN model. The three evaluation indices from the LR, Lasso, and Ridge models shared the same values, with an RMSPE of 7.65%, a MAPE of 5.36%, and an R 2 of 0.8787. Aligned with the results of Group 1, the kNN model showed the worst results among the various models in Group 2. The values from the validation set were consistent with the values from the training set and test set, which indicates that the models are valid.
Based on the values of R 2 , RMSPE, and MAPE, the fitting performances of the ML models are shown in Figure 3, where (a) shows the results from Group 1 and (b) is from Group 2. In order to identify the effectiveness of the models, we chose to draw the fitting charts of two different models in each group. The PR model was chosen from Group 1, as it was the most optimal model with highest R 2 and the smallest error rate. The optimal model from Group 2 was the ANN model. From the charts, the fitted curves in (a) performed well and almost resemble the tie lines, which means that most of the prediction values and the actual values match each other. To compare with (a), the curves in (b) showed larger movements, which indicates less accuracy from the model. There was no significant difference between the two models and only the discrete points were distributed slightly differently. Figure 3b showed a larger distance between the discrete points, whereas those from (a) stayed closer to the fitted curve, showing that the most optimal model from Group 2 had a higher error rate than the best model from Group 1, though this did not have any significant impact on the model fitting performance.
In summary, both the value tables and the performance charts show the effectiveness of the models. The models from Group 1 performed better than the models from Group 2, as did their most optimal models. In Group 1, the PR model worked best, which also showed better values than the ANN model, which was the best model from Group 2. values and the actual values match each other. To compare with (a), the curves in (b) showed larger movements, which indicates less accuracy from the model. There was no significant difference between the two models and only the discrete points were distributed slightly differently. Figure 3b showed a larger distance between the discrete points, whereas those from (a) stayed closer to the fitted curve, showing that the most optimal model from Group 2 had a higher error rate than the best model from Group 1, though this did not have any significant impact on the model fitting performance. In summary, both the value tables and the performance charts show the effectiveness of the models. The models from Group 1 performed better than the models from Group 2, as did their most optimal models. In Group 1, the PR model worked best, which also showed better values than the ANN model, which was the best model from Group 2.

Discussion
This study investigated a high-resolution dataset of electricity-carbon coupling collected from a typical cement production factory and demonstrated that it is possible to predict the total carbon emissions of cement production lines with a narrow range of errors using its electricity data, even though electricity is only a subsidiary source of carbon emissions. Based on this observation, we have attempted to construct and compare several

Discussion
This study investigated a high-resolution dataset of electricity-carbon coupling collected from a typical cement production factory and demonstrated that it is possible to predict the total carbon emissions of cement production lines with a narrow range of errors using its electricity data, even though electricity is only a subsidiary source of carbon emissions. Based on this observation, we have attempted to construct and compare several new electricity-carbon models using various machine learning algorithms. The overall performance of electricity-carbon models was very good, which shows the potential of applying a faster and cheaper approach to account and monitor emissions for the cement industry.
Electricity usage and carbon emissions are both determined by the quantity of clinkers proceeded on the manufacturing line, which makes the two factors intrinsically correlated. This correlation forms the basis for the electricity-carbon models. The statistical results of this study conform with previous research and suggest that the clinker production process is the largest contributor to carbon emissions in the cement production line through CaCO 3 calcination (~61%) and fossil fuel combustion (~35%) [4,[53][54][55]. Clinker production also requires electricity to support its associated mechanical motions, like grinding and rotating, which establishes an interdependence between calcination emissions, coal consumption, and electricity usage in this process. For the most part, the clinkers are directly converted to cement, which involves electricity consumption for cement grinding and further strengthens the correlation between electricity and overall clinker production. Occasionally, the cement manufacturer might sell clinkers as final products in response to market demands. In this case, the market impact on clinker/cement production can be reflected in the bimodal distribution of clinker production quantity ( Figure 3). When clinker production is high, the interconnection between electricity usage and carbon emissions weakens, potentially indicating a mismatch in clinker and cement production.
Though the electricity-carbon models with different algorithms and variables give consistently good results, the ones using Group 1 (supply) electricity data in general outperform the ones using Group 2 (consumption) electricity data. This might result from the emission discount on electricity generated from the waste heat recovery technology. When the clinkers are heated to a high temperature in the kilns, they generate high-temperature gases which can be utilized to drive steam turbines to generate electricity [20,56]. Electricity generated from this technology does not induce extra carbon emissions and is closely correlated to the quantity of clinkers produced and the temperature in the kilns [21,36]. Therefore, the prediction of emissions is better confined by incorporating knowledge from the waste heat electricity data. Though the Group 2 data had higher granularity regarding processes on the cement production line, the Group 1 data had a better focus on the major carbon emission contributor (the clinker production process), reducing the uncertainty generated from the variation in waste heat generation and local market demands.
The performance among different models was consistent. The complex models (such as Ridge and kNN) give similar, if not worse, predictions as the simple models (such as LR and PR). This is on account of the broad range of input values and tight linear correlation between the inputs (electricity data) and the outputs (emission data). The results of this study reassure the feasibility and reliability of predicting carbon emissions based on electricity data in the cement industry.
The main advantage of this novel electricity-carbon model is that it provides a low-cost, low-effort, and highly reliable method to account and/or monitor carbon emissions in the cement production line. The key factors for calculating total emissions identified in this case study were the amount of electricity purchased from external sources and generated from the waste heat. These data can be easily accessed and validated during regular daily factory management. Compared to the current EFA accounting method applied in China's national ETS and the accounting method and models suggested by previous studies [21,[28][29][30][31][32][33][34][35][36], the newly developed model requires significantly fewer variables. Hence, it avoids time lag and costly efforts in data collection.
The electricity-carbon model has the potential to be used in the cement industry on a broader scale. This case study is based on production lines using second-generation dry technology (a novel suspension preheater rotary kiln), which is adopted by approximately 99% of the cement plants in China [57]. The waste heat recovery system, which was integrated into the production lines in this case, has also gained popularity over the past two decades due to increased government incentives [21,36]. The magnitude and efficiency of waste heat recovery varies in different plants and production lines, which would require model calibration on a case-to-case basis. Recent innovations in low-carbon cement production [55,58], such as the addition of recycled materials, embodiment of gaseous CO 2 , and other carbon offset measurements listed by China's national ETS, are not included or examined in this case study, but potentially affect the accuracy of this novel electricity-carbon model. Given the potential for more cases to be investigated in the future, the current electricity-carbon model will be further tested and refined, possessing great economic potential.

Conclusions
This study demonstrated a strong correlation between electricity data and total carbon emissions in cement production processes. The electricity-carbon models were generated and compared among two groups of electricity data and six machine learning algorithms. Total carbon emissions can be predicted from electricity data with high confidence levels regardless of the choice of algorithm. The prediction is generally more accurate based on electricity data from the supply end (R 2~0 .96, Group 1: purchased electricity and electricity generated from waste heat) than that from the consumption end (R 2~0 .87, Group 2: electricity usage of each grinding machine on the production line). As second-generation dry technologies for cement production and waste heat recovery technologies are commonly used in the industry, these novel electricity-carbon models have the potential to be applied on a broader scale, providing an easier, cheaper, and faster solution for validating carbon emissions from cement production lines.
Although the novel electricity-carbon model performs well with the selected case, there are still more steps that need to be further modified and validated for practical application. To enhance the efficiency and effectiveness of carbon emission accounting approaches, more cases from various regions/scales/technologies are needed. Integrating the newly developed low-carbon innovations of cement production into the investigation will be another direction to guide the application of this novel accounting model towards cement industry decarbonization strategies. Furthermore, to address the initial concern of the study regarding reducing the uncertainty of carbon emission accounting approaches, besides the novel model, there is a need to explore some more policy implications. For example, promoting the data collection system to increase data precision from the original sources, introducing more machinery accounting approaches to substitute artificial accounting, and allowing other accounting approaches to assist and support China's national ETS.