MSA-Net: A Precise and Robust Model for Predicting the Carbon Content on an As-Received Basis of Coal

The carbon content as received (Car) of coal is essential for the emission factor method in IPCC methodology. The traditional carbon measurement mechanism relies on detection equipment, resulting in significant detection costs. To reduce detection costs and provide precise predictions of Cars even in the absence of measurements, this paper proposes a neural network combining MLP with an attention mechanism (MSA-Net). In this model, the Attention Module is proposed to extract important and potential features. The Skip-Connections are utilized for feature reuse. The Huber loss is used to reduce the error between predicted Car values and actual values. The experimental results show that when the input includes eight measured parameters, the MAPE of MSA-Net is only 0.83%, which is better than the state-of-the-art Gaussian Process Regression (GPR) method. MSA-Net exhibits better predictive performance compared to MLP, RNN, LSTM, and Transformer. Moreover, this article provides two measurement solutions for thermal power enterprises to reduce detection costs.


Introduction
Excessive global CO 2 emissions will exacerbate the greenhouse effect, leading to extreme weather events around the world and greatly impacting human survival and development [1].Although many researchers have conducted research on carbon emissions [2], carbon reduction [3], and other related issues [4,5], the global impact of the greenhouse effect is still intensifying.Energy-related CO 2 emissions are a significant contributor to global CO 2 emissions.According to the 2022 CO 2 emissions statistics published by the International Energy Agency (IEA) in March 2023, global energy-related CO 2 emissions were higher than 36.8Gt in 2022, an increase of approximately 0.9% compared to 2021 [6].As a major carbon emitter, China had energy-related CO 2 emissions of 10.2 Gt in 2022 [6], accounting for approximately 27.7% of global energy-related CO 2 emissions.Among them, the CO 2 emissions generated by coal consumption in the thermal power industry account for nearly 40%.Therefore, China has focused on CO 2 emission management in the thermal power industry, which is a highly energy-consuming industry, and has conducted extensive research on this topic [7,8].
In the carbon accounting of the thermal power industry, in order to ensure the accuracy of CO 2 emission data, multiple parameters need to be measured.According to GB/T 32151.1-2015"Greenhouse Gas Emission Accounting and Reporting Requirements Part 1: Power Generation Enterprises" [9], and according to the "Guidelines for Enterprise Greenhouse Gas Emission Accounting and Reporting" issued by the Ministry of Ecology and Environment in 2022 [10], the measured parameters related to CO 2 emissions generated by coal combustion mainly include furnace coal weight (FC), total moisture (M t ), carbon content as received (C ar ), net calorific value as received (NCV), moisture (M ad ), total sulfur (S t,ad ), hydrogen (H ad ), ash (A ad ), volatile matter (V ad ), and fixed carbon (FC ad ).The last six parameters are expressed as percentages on an air-dried basis.FC, C ar , and the carbon oxidation rate OF are used for CO 2 emission calculation.Therefore, the measurement of C ar s directly affects the carbon accounting results.If the C ar s are not measured or the measurements do not meet the requirements, it is necessary to use the default value of carbon content per unit calorific value (CC) and the NCV for conversion.Although the default value of CC had been lowered from 0.03356 tC/t to 0.03085 tC/t [11], the calculated CO 2 emissions based on this default value are still relatively high compared to the actual carbon emissions, resulting in thermal power enterprises bearing additional compliance costs.
In order to ensure the accuracy of carbon accounting, reduce detection costs, and allocate reasonable compliance costs in thermal power enterprises, many researchers have considered how to make reasonable predictions of carbon content when the measured data are incomplete during the statistical period.With the improvement in carbon accounting methods of thermal power enterprises and the further clarification of measured parameters, carbon content has become a key parameter for carbon accounting based on existing measured parameters.To achieve prediction of carbon content, Zhang et al. combined an attention mechanism with bidirectional ResNet-LSTM to propose the ABRF model [12].However, this model is only applicable to a single type of coal, which is inconsistent with the actual situation where thermal power plants usually use multiple coal blends for combustion [13,14], resulting in poor applicability of this model.Deng et al. proposed a simulated annealing differential evolution (SADE) neural network to predict the composition of coal [15].However, this method can only predict the fixed carbon content and cannot further predict the carbon content.Guan et al. proposed a method for predicting the carbon content of coal powder using Double Spectral Correction-Partial-Least-Squares (DCS-PLS) [16].However, the prediction relies on the collected signals of Laser-Induced Breakdown Spectroscopy (LIBS).Meanwhile, signal acquisition is still required in applications, which increases the detection costs.Jo et al. proposed an ANN-based method for predicting coal elements, but the model design of this method is too simple and the prediction accuracy is low [17].In addition, Ceylan et al. utilized moisture, ash, volatile matter, and fixed carbon as model inputs and the Gaussian Process Regression (GPR) model to predict elemental carbon content [18].However, the selection of kernel functions and hyper-parameters is difficult and requires extensive experience-based optimization.Yin et al. used Multi-Layer Perceptron (MLP) for carbon content prediction and applied it to CO 2 emission calculation [19].Although good prediction accuracy was achieved, the consideration of input parameters in model design was still incomplete, such as M t , M ad , S t,ad , and H ad , which were also correlated with carbon content.This problem limits the prediction accuracy of the models.Moreover, if more parameters are utilized, the models need to fully explore the interaction mechanism between each parameter and have the ability to pay attention to important and potential features (formed by Linear layer mapping).
For model input with all parameters, firstly, a qualitative analysis was conducted on the coal parameters related to the C ar , and the input parameters for C ar predicting were determined.Secondly, after using all parameters, this paper introduces the attention mechanism [20] into the model design and proposes an Attention module for predicting C ar while paying attention to important and potential features.Then, feature reuse was achieved through Skip-Connections [21] to improve feature utilization efficiency and form the MSA-Net model.Next, in order to improve the convergence of model training, we chose Huber loss as the loss function.Finally, to validate the effectiveness of the proposed method, we preprocessed the collected coal parameters and constructed the training and predicting datasets.Through quantitative and qualitative comparative experiments, the effectiveness and reliability of MSA-Net were verified.Meanwhile, through comparative results under different data split methods, the corresponding solutions were proposed for the practical application of MSA-Net in thermal power enterprises.

Analysis of Coal Carbon Content as Received
The main parameters for carbon accounting in thermal power enterprises are the consumption of coal and the carbon content of the base element received from coal.To predict the C ar , which is a key carbon accounting parameter, this paper further analyzes the C ar based on the relationship between the dry carbon content C d and various parameters in proximate analysis and ultimate analysis methods [22].According to the calculation method where In summary, considering the principle of coal detection and analysis in China [23-25], this paper takes M t , M ad , A ad , V ad , FC ad , H ad , S t,ad , and NCV as inputs to the prediction model.Furthermore, C ar is set as output to explore the construction method of the prediction model.We used a total-moisture analyzer 5E-MW6510 (Automatic Moisture Analyser produced from Changsha Kaiyuan Instruments Co., Ltd., Changsha, China) to determine M t according to GB/T 211-2017 [26].M ad , A ad , V ad , and FC ad are measured by a proximate analysis instrument 5E-MAG6700 (Proximate Analyzer produced from Changsha Kaiyuan Instruments Co., Ltd., Changsha, China) according to GB/T 212-2008 [27].S t,ad is measured by an automatic coulomb sulfur analyzer 5E-AS3200B (Automatic Coulomb Sulfur Analyzer produced from Changsha Kaiyuan Instruments Co., Ltd., Changsha, China) according to GB/T 214-2007 [28].C ad and H ad are measured by a carbon-hydrogen-nitrogen elemental analyzer 5E-CHN2200 (C/H/N Elemental Analyzer produced from Changsha Kaiyuan Instruments Co., Ltd., Changsha, China)according to GB/T 476-2008 [29].NCV and Q gr,ad are measured by a calorimeter 5E-C5500 (Automatic Calorimeter produced from Changsha Kaiyuan Instruments Co., Ltd., Changsha, China) according to GB/T 213-2008 [25].

Coal Carbon Content as Received Prediction Model
In this section, the eight parameters introduced in Section 2 are first used as model inputs to provide comprehensive reference parameters for C ar prediction.Secondly, in order to extract important and potential features, an Attention module suitable for predicting C ar in coal combustion is proposed.By combining and mapping the input or intermediate layer results, it outputs features suitable for predicting C ar .Then, the proposed Attention module is added to the model design, and Skip-Connections are used for feature reuse to build the MSA-Net model.Subsequently, the Huber loss is adopted as the loss function to train the MSA-Net model until convergence.Finally, the prediction of C ar in coal combustion is achieved.

Attention Module
In this paper, we refer to the design of attention mechanism in Natural Language Processing (NLP) [30] and construct an Attention module for predicting C ar .The structure is shown in Figure 1.For input parameters or mapped features, three Linear layers (Linear_A_1, Linear_A_2, Linear_A_3) are used to obtain corresponding Q (query), K (key), and V (value).The similarities between Qs and K T s are calculated using matrix multiplication.We obtain these similarities and normalize them using SoftMax(•) to obtain the weights.By weighted summing the weights and V, we obtain the output feature F, which is calculated as

MSA-Net Model
This paper introduces the proposed Attention module into the model based on the basic structure of MLP.At the same time, to better extract important and potential features, Skip-Connections are used to sum shallow features with mapped features, enabling feature reuse and guiding the synthesis of new features.The network structure is shown in Figure 2. MSA-Net consists of two similar structures (Step-1 and Step-2).In Step-1, the input is the measured parameters of coals.The potential features are generated from the Attention module.We utilize 3 linear layers and 2 activation layers for feature integration and mapping, Skip-Connections are added and feature reuse is achieved through Sum operation.The output is the mapped feature.In Step-2, the input is the mapping feature obtained from Step 1, and its structure is similar to that of Step 1.But the output dimension of the last Linear layer is adjusted to 1 for C ar prediction.If N-measured parameters are used as input to the model, the corresponding trainable parameters are shown in Table 1.In the experiments, M is set to 128 as the mapping dimension of the hidden layer.Meanwhile, α is set to 0.01 for LeakyReLU.

Loss Function
To ensure the robustness of the model to outliers during training and reduce its sensitivity to outliers, while also ensuring that the gradient gradually decreases as the loss value approaches its minimum, this paper uses the Huber loss L δ (•) as the loss function for model training.The formula is where y represents the measured C ar , and y ′ represents the algorithms or model prediction results.In the experiments, δ is set to 1.0.

Complete Dataset Production
We collected the measured data of a typical thermal power enterprise from 1 September 2022 to 31 August 2023.This enterprise has two generating units.The daily coal quality analysis and parameter measurements are performed on the coal consumed by both units.Due to shutdowns and routine maintenance, a total of 687 data points were collected.Each data point contains nine parameters: M t , M ad , A ad , V ad , FC ad , H ad , S t,ad , NCV, and C ar .After removing missing and abnormal values, a total of 529 pieces of data were sorted out.We performed maximum and minimum normalization on all measured data in the dataset, and the distribution of normalized C ar is shown in Figure 3.

Training and Predicting Datasets Split Methods
In the experiments, training and prediction were performed according to three rules of the dataset created in Section 4.1, according to three rules in the experiment: data partitioning based on stratified sampling, data partitioning based on odd or even months, and data partitioning based on odd or even dates.

Data Partitioning Based on Stratified Sampling
In the experiments of Sections 5.2 and 5.3, 75% of the data in the dataset was divided into a training set and 25% into a testing set using stratified sampling.The distribution of C ar s for this partitioning method is shown in Figure 4.

Data Partitioning Based on Odd or Even Months
In the experiments of Section 5.4, we divided each data point into odd and even numbers based on the actual measurement date of each dataset, with a total of 272 oddmonth data and 257 even-month data.The distribution of C ar for odd and even months is shown in Figure 5.

Data Partitioning Based on Odd or Even Days
In the experiments of Section 5.5, we further divided the collected 529 data points into odd and even dates, with a total of 223 odd-day data and 306 even-day data.The distribution of C ar data for odd days and even numbered days is shown in Figure 6.

Evaluation Metrics
This paper quantitatively compared the proposed method with existing methods using seven evaluation indicators in the experiment.These evaluation metrics are as follows.

1.
Mean Absolute Error (MAE) Root Mean Square Error (RMSE) Mean Absolute Percentage Error (MAPE) Coefficient of Determination (R 2 ) Pearson Correlation Coefficient (PCC) Concordance Correlation Coefficient (CCC) Explained Variance (Evar) where δ x represents the standard deviation of x, and µ x represents the mean of x.The values of MAE, RMSE, and MAPE are all greater than 0, where smaller values indicate better prediction performance.The values of R 2 , PCC, CCC, and Evar are all between 0 and 1.The closer the value is to 1, the better the model's prediction performance.

Experiments and Analysis
In this section, we compared the performance of MSA-Net with existing models.In addition, we also conducted ablation experiments to analyze the effects of different parts and combinations in MSA-Net.Finally, we conducted application research under different data partitions, providing an effective solution for the practical application of thermal power enterprises when the measured data are incomplete, and further testing the robustness of our trained model.

Implementation Details
The MSA-Net proposed in this paper is implemented on PyTorch-1.13.0.We conducted model training and testing on NVIDIA GeForce RTX 3070Ti (Graphics Processing Unit produced from NVIDIA Corporation, Santa Clara, CA, USA) with 8 GB of memory.In all experiments, we used the Adam optimizer [31] for model optimization, with β 1 = 0.9, β 2 = 0.999, and the batch size was set to 16.The model is trained for 240 epochs.The learning rate warm-up strategy [32] was employed during training.The first 40 epochs were the warm-up phase, followed by 200 epochs of learning rate decay.The learning rate lr in each epoch was calculated as where lr max is the maximum learning rate in the experiment, which is set to 0.005.
In comparison experiments, we considered two input modes.One refers to the four parameters of the industrial analysis method, including M ad , A ad , V ad , and Q gr,ad .Another type includes all eight parameters.Since the training results of neural network-based methods (such as RNN [33], LSTM [34], MLP [19], and MSA-Net) vary under different parameter initializations, for the same model, we used ten different parameter initialization methods to train the model to convergence using the above parameters.We then statistically evaluated the 10 different prediction results based on the evaluation indicators in Section 4.2.In experiments Sections 5.2 and 5.4, to verify the effectiveness of the Attention Module, in the MLP, the Attention Module was replaced with a linear layer of size N × N.
All comparative experiments were conducted using two model input methods.Referring to the correlation between measured parameters and elemental carbon content in the industrial analysis method in Section 2, the model input of the first method consists of four parameters: M ad , A ad , V ad , and Q gr,ad .Q gr,ad can be calculated based on NCV, H ad , M t , and M ad .The calculation method is The model input for the second method includes eight parameters: M t , M ad , A ad , V ad , FC ad , H ad , S t,ad , and NCV.

Models Performance Experiments and Analysis
To verify the effectiveness of the proposed model, we compared MSA-Net with existing methods.The comparison results are shown in Table 2.When the input is four parameters, MSA-Net achieved optimal results on all seven metrics.Compared to the GPR model (SOTA), the MSA-Net model achieved lower prediction errors.The MAE decreased by 2.67% (from 4.86 to 4.73) and the RMSE decreased by 2.81% (from 6.77 to 6.58).Compared to the MLP model, the MAE decreased by 6.34% (from 5.05 to 4.73) and the RMSE decreased by 4.78% (from 6.91 to 6.58).
When the inputs are eight parameters, MSA-Net also achieved the best performance on seven metrics.Compared to the GPR model (SOTA), the MAE decreased by 9.36% (from 5.02 to 4.55) and the RMSE decreased by 10.92% (from 6.96 to 6.20).Compared to the MLP model, the MAE decreased by 6.95% (from 4.89 to 4.55) and RMSE decreased by 6.91% (from 6.66 to 6.20).Due to the Attention module's focus on important or potential features, when all parameters are used as inputs, MSA-Net can effectively capture the important and potential features, further improving the prediction accuracy.The MAE is reduced by 3.81% (from 4.73 to 4.55), and the RMSE is reduced by 5.78% (from 6.58 to 6.20).Due to the differences in training results of neural network models under different parameter initializations, we also counted the prediction results under different parameter initializations.After sorting out the prediction results of some samples, a box plot is drawn as shown in Figure 7.When the inputs are four or eight parameters, the uncertainty of MSA-Net prediction results is significantly smaller than RNN, LSTM, and MLP models, and the median is also closer to the measured true value.We further summarized the mean and standard deviation of the prediction errors on the test set, as shown in Table 3.When given four input parameters, MSA-Net achieved the best performance on all seven evaluation metrics in terms of both the mean and standard deviation.When given eight input parameters, MSA-Net achieved the best performance on all seven evaluation metrics in terms of the mean.In addition, it achieved the lowest standard deviation on three metrics.Overall, MSA-Net not only achieves good prediction accuracy but also has stable prediction results under different parameter initializations after training.

Ablation Analysis
To verify the effectiveness of the proposed modules, we conducted ablation experiments as shown in Table 4.By adopting Huber loss on the MLP (Model A), the MAE can be reduced by 1.78% (from 5.05 to 4.96).By adding one Attention module (Model B), the MAE is reduced by 1.01% (from 4.96 to 4.91).With two Attention modules (Model C), the MAE is further reduced by 1.81% (from 4.96 to 4.87).However, when increasing the number of Attention modules to three (Model D), the prediction accuracy of the model has significantly decreased.By incorporating Skip-Connections on the basis of two Attention modules, the proposed MSA-Net can further reduce the MAE by 2.87% (from 4.87 to 4.73).When the number of input parameters is increased from four to eight, the MAE is reduced by 3.81% (from 4.73 to 4.55).

Model Testing Experiments on Dividing Datasets on Odd and Even Months
We quantitatively compared MSA-Net with RNN, LSTM, and MLP on seven evaluation metrics, and presented the mean and standard deviation of each metric under 10 different parameter initialization methods.We conducted this part of the experiment in two ways.The first method was to train the model using odd-month data and make predictions on even-month data.The quantitative comparison results are shown in Table 5.The second method is to use even-month data for model training and predict odd-month data.The quantitative comparison results are shown in Table 6.By comparing the predictive stability of different models, in Tables 5 and 6, when the inputs are four or eight parameters, MSA-Net has the smallest standard deviation on seven indicators compared to RNN, LSTM, MLP, and Transformer, indicating that MSA-Net can predict more stably under different parameter initialization conditions.
In summary, for the thermal power enterprises, within a one-year statistical period, they can measure the elemental carbon in the first nine months according to the requirements and use these measured data for training the MSA-Net model.In the last three months, they only need to measure the M ad , A ad , V ad , and Q gr,ad to predict the C ad using the trained MSA-Net model.After that, the measured M t can be used to convert C ad to C ar .

Model Testing Experiments on Dividing Datasets on Odd and Even Days
We conducted this part of the experiment in two ways.The first method was to train the model using odd-day data and make predictions on even-day data.The quantitative comparison results are shown in Table 7.The second method is to use even-day data for model training and predict odd-day data.The quantitative comparison results are shown in Table 8.When the inputs are eight parameters, MSA-Net has a 20.0% decrease in RMSE compared to MLP (from 7.45 to 5.96).The MSA-Net model reduces RMSE by 7.74% (from 6.46 to 5.96) compared to the Transformer model.In the second method, when the inputs are eight parameters, MSA-Net has an 8.66% decrease in RMSE compared to MLP (from 7.04 to 6.43).The MSA-Net model reduces RMSE by 7.35% (from 6.94 to 6.43) compared to the Transformer model.In addition, MSA-Net has the smallest standard deviation on seven indicators under each partition and different input conditions, indicating good predictive stability.Due to the inconsistent distribution of single-and double-day data, as well as the relatively small proportion of training data to test data, the model learning difficulty is high.Among all comparison models, MSA-Net still achieved the best prediction accuracy, with MAPE of only 0.76% (input four parameters) and 0.87% (input eight parameters).In summary, within a one-year statistical period, thermal power enterprises can conduct complete data measurements of elemental carbon on odd (even) days and use these measured data for training the MSA-Net model.On even (odd) days, they can measure the M t , M ad , A ad , V ad , FC ad , H ad , S t,ad , and NCV.With these parameters, the trained MSA-Net can predict the C ar .

Discussion
In Section 5.1, we introduced the four commonly used parameters in proximate analysis.When only M ad , A ad , V ad , and Q gr,ad are used, it contains a total of six parameters including M ad , A ad , V ad , FC ad , NCV, and H ad .This parameter selection method completely ignores the effect of S t,ad (from ultimate analysis) on C ar calculation.If we further increase the parameters S t,ad , MSA-Net will have a further improvement compared to inputting four parameters as shown in Table 9.However, due to the inherent inclusion of other parameters, effective decoupling is not possible.Therefore, in most cases, MSA-Net with eight inputs can achieve the highest accuracy.

Conclusions
C ar is an important parameter for carbon accounting in thermal power enterprises.However, in actual production processes, due to equipment maintenance, repairs, or damage, measurement data may be missing.Using default values will increase the compliance costs of the enterprises.Therefore, it is of great significance to use reliable prediction models to accurately predict C ar when measurement data are missing, in order to ensure the accuracy of carbon accounting and protect the interests of the enterprise.Based on existing research, this paper first analyzes the parameters related to C ar .Secondly, these parameters are used as input to propose a carbon content prediction model based on the attention mechanism called MSA-Net.The construction process and details of the Attention module are introduced in detail.Then, the complete measured data are collected from thermal power enterprises, and after data preprocessing, a C ar prediction dataset is constructed for model training and testing.Subsequently, the effectiveness and reliability of MSA-Net are verified by comparing it with existing methods on the constructed dataset.Finally, two solutions are proposed to reduce the frequency of measurements for thermal power enterprises, thereby reducing their detection costs.

Figure 1 .
Figure 1.The structure of Attention Module.

Figure 3 .
Figure 3.The distribution of normalized C ar s.

Figure 4 .
Figure 4.The distribution of C ar s after stratified sampling: (a) The distribution of C ar s on the training set; (b) The distribution of C ar s on the testing set.

Figure 5 .
Figure 5.The distribution of C ar s by months: (a) The distribution of C ar s on odd months; (b) The distribution of C ar s on even months.

Figure 6 .
Figure 6.The distribution of C ar s by months: (a) The distribution of C ar s on odd days; (b) The distribution of C ar s on even days.

Figure 7 .
Figure 7. Visual comparison of partial prediction results in the test set: (a) The comparison results of different models (input four parameters); (b) The comparison results of different models (input eight input parameters).
d is dry basis carbon content, A d is dry basis ash, V d is dry basis volatile matter, S t,d is dry basis total sulfur and high calorific value Q gr,d , α 0 = 35.411,α 1 = −0.341,α 2 = −0.199,α 3 = −0.412,α 4 = 1.632.The C d is correlated with A d , V d , S t,d , and Q gr,d .Firstly, considering that the measurements are mostly based on air-dried basis, it is necessary to achieve benchmark conversion between air-dried basis and dry basis based on the Mad for prediction.Secondly, due to the 100% mass fraction of M ad , A ad , V ad , and FC ad , when M ad , A ad , and V ad are measured, FC ad can be calculated.Therefore, FC ad also has a certain correlation with C d .Then, due to the influence of early CO 2 emission calculation methods, the calorific value of coal measured by Chinese thermal power enterprises is the NCV.Significantly, Q gr,d can combine with M ad and H ad to convert into NCV.Therefore, NCV and H ad should also be taken into consideration.Finally, the basis conversion between dry basis and as received requires M t , which should also be taken into account.

Table 1 .
Parameters of each layer in MSA-Net.

Table 2 .
Quantitative evaluation of different models.(RNN, LSTM, MLP, and MSA-Net are the average results).

Table 3 .
Quantitative evaluation of model accuracy and stability.

Table 4 .
Ablation analysis results of each module.

Table 5 .
The odd-month data are used for model training, and the even-month data are used for quantitative evaluation of prediction.

Table 6 .
The even-month data are used for model training, and the odd-month data are used for quantitative evaluation of prediction.By comparing the prediction errors of different models, inTable 5, when the inputs are four parameters, the MSA-Net model reduces RMSE by 12.97% (from 7.40 to 6.44) compared to the RNN model.Compared with the LSTM model, RMSE decreases by 15.15% (from 7.59 to 6.44), and RMSE decreases by 8.00% (from 7.00 to 6.44) compared to the MLP model.The MSA-Net model reduces RMSE by 3.88% (from 6.70 to 6.44) compared to the Transformer model.When the inputs are eight parameters, taking the RMSE index as an example, MSA-Net decreased by 14.55% (from 7.49 to 6.40) compared to RNN, 18.26% (from 7.83 to 6.40) compared to LSTM, 12.81% (from 7.34 to 6.40) compared to MLP, and 2.44% (from 6.56 to 6.40) compared to Transformer.When the number of input parameters increased from four to eight, the MAE of the MSA-Net model decreased by 1.87% (from 4.81 to 4.72).In Table 6, when inputting four parameters, the MSA-Net model reduces RMSE by 11.19% (from 6.97 to 6.19) compared to the RNN model, and reduces RMSE by 11.32% (from 6.98 to 6.19) compared to the MLP model.The MSA-Net model reduces RMSE by 7.06% (from 6.66 to 6.19) compared to the Transformer model.When the number of input parameters increased from four to eight, the accuracy of the MSA-Net model slightly decreased, and the MAE increased by 0.70% (from 4.31 to 4.34).

Table 7 .
The odd-day data are used for model training, and the even-day data are used for quantitative evaluation of prediction.

Table 8 .
The even-day data are used for model training, and the odd-day data are used for quantitative evaluation of prediction.

Table 9 .
The impact of different input parameters on MSA-Net.A ad , V ad , Q gr,ad , S t,ad 4.75 6.42 0.88 M t , M ad , A ad , V ad , FC ad , H ad , S t,ad , NCV A ad , V ad , Q gr,ad , S t,ad 4.26 6.10 0.77 M t , M ad , A ad , V ad , FC ad , H ad , S t,ad , NCV A ad , V ad , Q gr,ad , S t,ad 4.30 6.15 0.78 M t , M ad , A ad , V ad , FC ad , H ad , S t,ad , NCV A ad , V ad , Q gr,ad 4.98 6.69 0.91 M ad , A ad , V ad , Q gr,ad , S t,ad 4.85 6.60 0.89 M t , M ad , A ad , V ad , FC ad , H ad , S t,ad , NCV