Low-carbon economic dispatch strategy for integrated electrical and gas system with GCCP based on multi-agent deep reinforcement learning

With the growing concern for the environment, sustainable development centred on a low-carbon economy has become a unifying pursuit for the energy industry. Integrated energy systems (IES) that combine multiple energy sources such as electricity, heat and gas are essential to facilitate the consumption of renewable energy and the reduction of carbon emission. In this paper, gas turbine (GT), carbon capture and storage (CCS) and power-to-gas (P2G) device are introduced to construct a new carbon capture coupling device model, GT-CCS-P2G (GCCP), which is applied to the integrated electrical and gas system (IEGS). Multi-agent soft actor critic (MASAC) applies historical trajectory representations, parameter spatial techniques and deep densi ﬁ cation frameworks to reinforcement learning for reducing the detrimental effects of time-series data on the decisional procedure. The energy scheduling problem of IEGS is rede ﬁ ned as a Markov game, which is addressed by adopting a low carbon economic control framework based on MASAC with minimum operating cost and minimum carbon emission as the optimization objectives. To validate the rationality and effectiveness of the proposed low-carbon economy scheduling model of IEGS based on MASAC, this paper simulates and analyses in integrated PJM-5 node system and seven nodes natural gas system.


Introduction
With population growth and accelerated industrialization, energy consumption is increasing, along with significant greenhouse gas emission.These emissions have a huge impact on climate change, with issues such as extreme weather events, sea-level rise and ecosystem collapse (Ma et al., 2024;Wojtaszek et al., 2024).It is therefore crucial to adopt energy-saving and emission-reduction measures (Li et al., 2019).By promoting the use of renewable energy, improving energy efficiency and adopting cleaner production technologies, we can reduce dependence on fossil fuels and lower greenhouse gas emissions (Liu et al., 2024;Okedu et al., 2024).
The United Nations Framework Convention on Climate Change, adopted by the United Nations in 1994, became the world's first international treaty to reduce emissions of greenhouse gases such as carbon dioxide and methane, requiring all countries in the world to take some responsibility for carbon emission reduction (Werksman, 1994).In 2005, countries continued to sign the Kyoto Protocol, which provided programs to reduce carbon emissions and established corresponding cooperation mechanisms for countries to deal with carbon emission reduction, thus advancing the development of global carbon emission reduction (Gallo et al., 2018).The Paris Agreement, signed in Paris in 2015, requires developed countries, while accomplishing their own carbon emission reduction tasks, to provide certain financial support and related technologies to developing countries, contributing to the fulfillment of the global carbon emission reduction tasks (Elsayed et al., 2024).In 2021, China proposes to vigorously develop renewable energy sources such as wind power and photovoltaic, laying a solid foundation for achieving carbon peak by 2030 and carbon neutrality by 2060 (Zhong et al., 2023).However, renewable energy sources such as wind turbine (WT) and photovoltaic (PV) have problems such as chronology, stochasticity and volatility.When a high percentage of renewable energy is injected into the grid it can bring about problems such as sudden voltage changes (Cao et al., 2020a;Cao et al., 2021) and system collapse (Barker and Mello, 2000;Dulăua et al., 2013), which has also become the focus of carbon emission reduction (Gao and Zhang, 2024).
In traditional energy dispatching, various energy sources are dispatched by different sub-networks (e.g., electrical network, gas network, and heat/cooling network), in which the sub-networks are controlled by different departments.In actual operation, power grids are coupled with each other in terms of production, transmission, distribution and utilization.However, a singular scheduling makes it impossible to form an effective information sharing and energy complementarity between sub-networks, so as to fail to guarantee the system stability.Multi-energy flow deployment can improve the efficiency of energy utilization, reduce the total cost of system operation, and realize the stability and safety of system operation (Liu et al., 2018;Wang et al., 2023a).Consequently, breaking the limitations of traditional energy architecture and constructing multi-energy network architecture, such as electricity-heat-gascooling integrated energy system (IES) to tap the potential of energy transmission between different systems, is one of the core contents of research by scholars in related fields in various countries (Liu et al., 2023).
Substituting conventional thermal power generation with renewable energy sources such as wind and photovoltaic, reducing the utilization rate of conventional thermal power plants by applying power-to-gas (P2G) and gas turbines (GT), and achieving carbon dioxide (CO2) absorption and utilization are significant in reducing carbon emissions.Carbon capture and storage (CCS) provides an alternative and effective technology for dealing with CO2 emissions, with 92 per cent of the CO2 produced by coal-fired units being captured and stored in the IES.It can be seen that the combination of P2G and CCS technologies can effectively reduce and improve carbon emissions, and the carbon capture technology also improves the carbon feedstock for methane generation by P2G, thereby reducing the amount of CO2 generated by coal-fired units in the IES (Gu et al., 2017;Yang et al., 2019).He et al. (2022) constructs a near-zero emission park-level IES considering uncertainty with P2G and CCS.Zhang et al. (2020) constructed an integrated electricity-gas energy system (IEGS) optimization model considering P2G and wind power uncertainty based on distributed robust optimization.The superiority of the low carbon emission is effectively verified by the results of three different IEGSs.In order to determine the optimal capacity of the gas turbine and P2G technology for different IEGS, a Monte Carlo based optimization framework has been proposed in Tabebordbar et al. (2023).The experimental results demonstrate the superiority and sophistication of the reliability-oriented optimization framework.However the algorithms adopted in the above mentioned literature struggle to attain the satisfactory results in the presence of the complexity and diversity of the system, the mutual constraints of the coupled energy components and the large dimension of the optimization objective.
With the advancement of science and technology and the rapid progress of artificial intelligence technology, many scholars have proposed a variety of optimization and control strategies based on machine learning in the literature.Reinforcement learning (RL) is currently the most popular method to solve the control optimization problem, which includes both agent and environment (Cao et al., 2020b;Zhang et al., 2023a;Cao et al., 2023;Li et al., 2024aLi et al., , 2024c)).Zhang et al. (2023b) proposed a two-timescale energy management strategy based on multi-agent deep RL (MADRL) for residential multicarrier energy system, where the optimal solution of each coupling element in the system is obtained to achieve the optimal control effect.Aiming at the joint operation of multiple microgrids, a MADRL-based energy management method is proposed in Li et al. (2023).Each microgrid as an agent performs a game with each other, and the continuous training ensures that each agent chooses the local optimal strategy under the global optimal situation.Taking into account the different characteristics between the electricity and the heat network, Monfaredi et al. (2023) achieves an hourly optimal scheduling strategy by scheduling multiple renewable energy sources.During the optimization process, MADRL is applied to achieve the information interaction between the energy storage system, new energy sources, heat and power conversion system and the grid, which results in the desirable control strategy of the system to improve the energy utilization.In order to address the distributed energy management problem of multi-area IES, a MADRL-based energy management strategy is proposed, which effectively decreases the influences of renewable energy uncertainty on the decision-making of the optimization model by exploiting the generalization capability of RL (Ding et al., 2024).A MADRLbased building energy management model has been proposed which achieves excellent dynamic decision making through centralized training and distributed execution (Wang et al., 2024).
This paper proposes a low carbon and economic IEGS scheduling method based on multi-agent soft actor critic (MASAC), which achieves a bidirectional coupling between the electrical network and gas network by utilizing the P2G and GT. the CCS captures the CO 2 produced by the power plant as feedstock for the CH4 produced by the P2G, which will reduce the carbon emissions of the system, and the excess CO 2 will be purchased through the carbon trading market to achieve a zero-carbon system.The main contributions of this paper can be summarized as.
1) A GT-CCS-P2G (GCCP) model is presented to achieve a twoway coupling between the electric and gas grids in IEGS.
2) The electrical network and gas network are separately modelled as an agent to enhance the generalization capability of the energy dispatch model through reciprocal gaming.3) A novel energy scheduling strategy model based on MASAC exploiting historical data is proposed.
The reminder structure of this article is as follows.Section 2 focuses on the theory of electrical and gas networks and related coupling elements.Section 3 describes the algorithmic solution process of the method proposed in this paper.Section 4 verifies the superiority of the proposed method through a detailed analysis of examples.Section 5 indicates the summary of the whole paper.

Problem formulation
The detailed structure of IEGS is shown in Figure 1, including the electrical and gas networks.Electricity in the power network is supplied by thermal power plants, GT and WT, where the CO 2 from the thermal power plants is captured by CCS and used as feedstock for methane generation by P2G.CO 2 from thermal power plants is converted to CH 4 by P2G treatment which is transmitted to the natural gas network.CO 2 that cannot be captured by CCS will be purchased for carbon emission trading rights to achieve the zero carbon target.Perfectly realize the bi-directional coupling between electrical and gas networks through GT, CCS and P2G, enabling a bi-directional flow of energy.Battery energy storage system (BESS) as a rechargeable and dischargeable energy device enhances the proportion of renewable energy consumed by the electrical network.

Natural gas system modelling
In a natural gas system, natural gas is transmitted to the consumer through a pipeline, which consists of a gas source, a gas load, a transmission pipeline, and a compressor (Zhang et al., 2024a).Natural gas system modelling is mainly gas source, load, nodal pressure and pipeline flow modelling.

Gas source and load
The main components of a typical natural gas network are gas wells and gas storage stations.In practice, the supply of natural gas from gas wells is not unlimited with certain constraints that should exist in its supply.It can be expressed as Eq. 1.
where S i,t is the Natural gas supply; S i,min and S i,max represent the minimum and maximum of Natural gas supply, separately.

Pipeline flow modelling
During the transport of natural gas, its flow rate does not decrease.Analogous to voltage losses in power system, nod al pressure losses will exist at the beginning and end of a natural gas pipeline.Flow always moves from the high pressure node to the low pressure node in the pipeline, with the magnitude depending on the length of the pipeline, diameter, operating temperature and pressure.The relationship between pipe flow rate magnitude and pipe pressure can be expressed as Eqs 2, 3 (Dai et al., 2020).
where B ij is pipe flow rate; π is the node pressure; sng(π i , π j ) is the function whose value is 1 When the pressure at node i is higher than The structure of the IEGS.
Frontiers in Energy Research frontiersin.orgthat at node j and vice versa its value is −1; C ij is a coefficient; π i,min and π i,max are the minimum and maximum of the node pressure.
The correlation matrix between the injected flow at each node and the pipeline flow can be established by using the forward backward generation method for the acyclic natural gas network.The correlation matrix between the injection flow and the pipeline flow at each node can be established for the noncyclic natural gas network by using the forward back generation method, which is similar to the concept of the generation shift factor (GSF) in the direct current flow method of the power system.The relationship between the natural gas supply and load at each node with the pipeline flow is represented by.
where n denotes the nature gas injection node,; GL n indicates the consumption gas load of node n.
A link between each pipe node is established based on Eq. ( 4).Therefore, the pressure in each node can be obtained from the acquired pipeline flow rate based on Eq. (2).

Compressor station
As the distance of gas transmission increases, pressure losses between nodes can lead to low pressure at the end of the pipeline which limits the transmission capacity of the network.The most important part of the compressor station is the compressor which consumes electrical energy that increases the pressure of the natural gas.The pressurization station considered in this paper is of fixed variable ratio and the energy consumed is from electrical energy contained in the load of the grid node (Bai et al., 2016).It can be presented as Eqs 5, 6.
where H com represents the power required by the compressor; E, G and χ represent coefficient; P com represents the electrical load required by the compressor.

Electricity system modelling
The results of power flow calculations are the basis for analyzing the feasibility, safety, reliability and economics of grid planning and supply options.Power flow analyses perform a vital role in grid operation modelling and designing which can be calculated as Eqs 7, 8.
where P i,t denotes the active power injected by node i at time t; G ij,t and B ij,t represent the correlation values of the node i and node j conductivity matrices, respectively; N indicates the total amount of nodes; θ ij,t denotes the phase difference from nodes i with j; Q i,t denotes the reactive power injected by node i at time t.

Coupling elements modelling
The GT and the P2G technology enable the deep coupling of the power system with the natural gas system, thereby enabling a bidirectional conversion of the electricity-gas system.

GT
The GT can be viewed as a power source in a power system, however in the natural gas system as the load.The relationship between the power generated and the natural gas consumed can be expressed as Eq. 9 (Ji et al., 2013).
where P GT,i,t represents the generation of electricity from the GT at node i at time t; ζ GT represents conversion efficiency of GT; GL GT,i,t represents the gas load of the GT at node i at time t.

P2G
P2G technology consists of two main steps: the electrolysis of water and the synthesis of methane.The chemical equations for the two reactions are expressed as Eqs 10, 11 (Clegg and Mancarella, 2015).
The hydrogen produced in the first step can be stored in a hydrogen storage facility or injected into the network in a mixture of natural gas, but the concentration is limited due to safety factors.The methane produced in the second step can be stored in large quantities or transported to other places where it is necessary, which absorbs a large amount of CO 2 and reduces carbon emissions.P2G not only strengthens the coupling of the electricity-gas network, but also consumes the electricity generated by the new energy to increase the proportion of new energy consumption.The conversion relationship between P2G consumption of electrical energy and the generation of natural gas can be demonstrated as Eqs 12, 13.

CCS
CCS has been identified as a key and promising technology for future power generation (Zhang et al., 2024).Capture and storage are the two main phases of current carbon capture technology.The capture process is complex, and the main commercially available CO 2 capture methods fall into three categories: oxygenated fuel combustion, pre-combustion and post-combustion technologies.Post-combustion technology deals with the CO 2 containing gases produced by conventional fossil fuel plants and achieves the process of separating CO 2 from other gases.Pre-combustion technology, on the other hand, is the process of pre-treating the fuel to sort the carbon in it from other substances.Unlike the first two technologies, oxy-fuel combustion technology changes the environment in which the fuel is burned, allowing it to be burned in an environment containing only oxygen to obtain carbon dioxide and water.Of these, post-combustion is currently the more widely used method, and it is also the most cost-effective of the three CO 2 capture technologies.Sequestration technology begins with the creation of pipelines to transport carbon dioxide, which is then sequestered by compressors.
The electrical energy consumed by the CCS during operation is expressed as Eq.14.
where P CCS,t represents the power consumed by the CCS to capture CO2 at time t; ζ CCS indicates CCS capture efficiency; C CO2 CCS,t denotes the amount of carbon dioxide captured at time t.

GT-CCS-P2G
Conventionally, carbon capture power plants have operated CCS in combination with thermal power plants or CHP.However, to better reduce carbon emissions, a conventional power plant is replaced by GT coupled with CCS and P2G in this paper, which enhances the coupling of electrical energy flow.In terms of carbon emission, the CCS captures the carbon dioxide emitted by the GT and supplies the P2G with CO 2 to generate methane.In terms of energy supply, the GT unit supplies electricity to P2G and CCS, meanwhile P2G can also supply a small amount of natural gas to the GT.In terms of economic cost, P2G avoids the cost of purchasing CO 2 and CCS reduces purchase electricity from the main grid.The energy flow route of GT-CCS-P2G (GCCP) is shown in Figure 2.

GCCP operational power
The power consumed by P2G and CCS in the GCCP combined operation model is supplied by GT and the excess power will participate in the power network dispatch which can be calculated as Eq. 15.
P GCCP,t P GT,t − P CCS,t − P P2G,t where P GCCP,t indicates that the GCCP participates in grid dispatch power at time t; P GT,t denotes the power produced by GT at time t.
The GT, CCS and P2G power constraint can be formulated as Eqs 16-18.P GT,min ≤ P GT,t ≤ P GT,max (16) where P GT,min and P GT,max indicates GT operating minimum and maximum power, separately; P CCS,min and P CCS,max represents the minimum and maximum power for capturing CO 2 by CCS, respectively; P P2G,min and P P2G,max denote the minimum and maximum power for P2G operation, separately.

GCCP carbon emission calculation
During operation of the GCCP, GT burning of natural gas releases CO 2 , which can be expressed as Eq.19.
where C CO2 GT,t denotes the CO 2 produced by the GT operation at time t; ζ CO2 GT indicates the carbon emission factor for GT operation.
During GCCP operation, P2G reduces the carbon emissions of the system by capturing CO 2 .P2G The synthesis of CH 4 is divided into two main steps: the first step is the electrolysis of water, in which the principle of electrolysis of water is used to convert electrical energy into hydrogen energy in an electrolytic Frontiers in Energy Research frontiersin.org05 tank.The second step is methanation, where the hydrogen energy generated is converted to CH4 and heat energy in a Sabatier reaction with carbon dioxide in a methane reactor which can be calculated as Eq.20.
where C CO2 P2G,t indicates the amount of CO 2 captured during P2G operation at time t; ζ H2−CO2 denotes the coefficient of conversion between H 2 and CO 2 ; ζ H2 P2G represents the efficiency of hydrogen generation by P2G.
In the GCCP coupling model, the CCS captures CO 2 from GT operation simultaneously providing P2G with the CO 2 required for CH 4 production.CH4.The carbon emissions from the GCCP can be expressed as follow (Eq.21).
where C CO2 GCCP,t is the carbon emissions of GCCP at time t.

Objective function
In consideration of the above-mentioned model, the energy scheduling of the IEGS system is viewed as an optimization problem which involves the minimization of the following objective function (Eq.22).min where T represents the length of the operating hours; C CCS,t is the cost of CCS at time t; C P2G,t indicates the cost of P2G at time t; C P,t denotes the cost of purchasing carbon emissions trading at time t; C E,t is the cost of purchasing coal at time t; C Wind,t represents the cost of abandoned wind at time t; C Gas,t is the cost of acquiring gas at time t.
The electricity consumed by the CCS to collect CO2 from the gas produced by the thermal power unit through compression and separation mainly consists of fixed energy consumption and operation energy consumption.The detailed calculation is expressed as follows (Eqs 23-26).
where P e CCS represents CCS fixed energy consumption; ψ is the energy coefficient for capturing CO 2 ; C CO2 CCS,t indicates the capacity to capture CO 2 at time t; C e CCS is the cost consumed in the operation of CCS; η e denotes the price of electricity at time t; C r CCS represents the depreciated cost of CCS; C a is the total investment cost of CCS; ω a represents the depreciation factor of CCS; N a is the depreciable year of CCS.
Similar to CCS, the cost of P2G can be expressed as (Eqs 27-29).
where C e P2G is the cost consumed in the operation of P2G; η CO2 denotes the price of CO 2 ; Ζ CO2 represents the volume of CO 2 absorbed by P2G; η CH4 is the proceeds from the generation of CH4; E CH4,t indicates the total volume of CH4 produced at time t; C r P2G represents the depreciated cost of P2G; C b is the total investment cost of P2G; ω b represents the depreciation factor of P2G; N b is the depreciable year of P2G.
The remaining portion of the costing is shown below (Eqs 30-33).
where τ denotes the coefficient for purchasing carbon emissions; ς is the CO 2 emission factor for thermal power units; P electricty,t indicates the power purchased from the grid at time t; Ζ CO2,t represents the volume of CO 2 captured by CCS at time t; a E , b E and c E , is the coefficient of operating costs of thermal power units; η Wind represents the wind discard cost factor; ΔP Wind,t is the power of the discarded wind; η Gas indicates the price of natural gas; E Gas,t represents the volume of gas consumed by gas network at time t.

Constraints
The constraints that need to be satisfied during power system operation include power balance, nodal voltage limits and thermal generator output constraints (Eqs 34-37).
P Load,t P electricity,t + P Wind,t − ΔP Wind,t + P GCCP,t (34) where P Load,t denotes the load power at time t; V i,t represents the voltage at node i at time t; V min and V max are the upper and lower voltage limits for safe grid operation; P min,t and P max,t denote the upper and lower thresholds for thermal generators, respectively; d min,t and d max,t indicate the upper and lower thresholds of climbing power for thermal generators, separately.The gas network system consists of three main components: the gas supply source, the gas network and the gas load (Eqs 38, 39).E Gas,a + E P2G,a E Load,a + E GT,a (38) where E Gas,a is the injection at node a gas source point; E P2G,a denotes the amount of gas produced by P2G at node a; E Load,a represents the gas load required by the gas network at node a; E GT,a is the gas load consumed by GT at node a; F min and F max denote the upper and lower thresholds for pipeline delivery of natural gas flow, respectively; F t represents the flow rate conveyed by the pipe at time t.
The constraints for other auxiliary equipment are expressed as follows Eqs 40-41.

Markov game modeling
The energy management optimization problem can be modelled as a Markov game which is solved by the presented MADRL algorithm.The Markov game contains several components (Li et al., 2023).
• Agent: In the Markov game, the power grid and the gas grid are respectively modeled as an agent.• Environment: Before each decision, the agents collect information from the nodes in their corresponding region.Each agent makes a decision based on the local observation information which calculates the reward value for each agent based on the decision.indicate the action of the grid and gas network, respectively.
• Reward: The reward value obtained by the system is the value returned when each agent performs an action based on the current state.Each agent shares the same reward function, which is expressed as Eq.46.
Training process of the proposed MADRL method.
where l 1 and l 2 denote Weighting coefficients; Z indicates a constant.The energy supply optimization problem of IEGS is transformed into a Markov game, where the grid agent and the gas agent search for optimal actions by continuously learning the game to attain the best control.In the specific training process, each agent provides the corresponding action by observing part of the state, with the corresponding reward value passed to the agent.While obtaining the reward value, the agent can observe the environment state at the next time.As the number of iterations increases, each agent continuously adjusts its action value through mutual games to maximize the reward value.

Proposed approach based on MADRL
Each agent has the action-critic network framework which the action network is responsible for strategy program evaluation and the critic network is responsible for strategy parameter updating.Through the interaction and iteration between the two networks, the parameters of the network are continuously updated with the reward value gradually moving towards maximization.The proposed method adopts MASAC (Li X. Y. et al., 2024;Hu et al., 2024) as a kernel, which effectively mitigates the influence of environmental data fluctuations on energy scheduling decisions by sharing environmental and historical information between agents.Each intelligence in MASAC has four deep neural networks, namely, actor network and critic network and target actor network and target critic network.During the training process, only the parameters of the actor network and the critic network are updated, whereas the target actor network and the target critic network are employed to stabilize the learning effect of the actor network and the critic network.

Critic network
The target critic network is mainly employed to mitigate the rate of parameter updates to balance the stability and speed of the training process.It is presented as Eq.47.
where π ϑ (• | s g t ) is the value function in the actor network of agent g; denotes the function of target actor network; ϑ′ is the parameters of the target actor network; θ′ is the parameters of the target critic network; a g t represents the value passed from the action network of agent g.The computed Q-value is applied to compute the loss function of the criticized network which can be calculated as Eqs 48, 49.
where h t is the value of Q for the specific situation; r(s g t , a g t ) denotes the total reward obtained by multiple agents performing action a g t in global state s g t ; υ represents the discount factor; θ is the parameters of the critic network; E(•) indicates the mathematical expectation function.
The gradient of the criticized network parameters ∇ θ L(θ) is obtained using the gradient descent method which can be presented as Eqs 50, 51.
Framework diagram of integrated PJM-5 node power system and seven node gas system.

Parameters Value
Temperature parameter 0.1 Reward discount factor 0.95

Memory capacity 1e6
Learning rate of actor 1e-3 Learning rate of critic 1e-3

Soft replacement 1e-2
Batch size for updating 256 where β c is the learning rate of the critic network; θ t+1 denotes the parameters of the critic network at time t+1.

Actor network
The expression for the value function in the actor network is Eq.52: where ϑ denotes the parameters of the actor network.This leads to the gradient function of the actor network value function ∇ ϑ L(ϑ) is Eqs 53, 54: where β a is the learning rate of the actor network.
In order to prevent the value function in the critic network from overly agreeing with the Q-value calculated by the target value function, a corresponding noise function ς t based on a normal distribution is attached to the value passed from the target value Convergence process of proposed method on the train set.Load for a particular day on the test set.

Frontiers in
In the training process, the evaluation network is mainly to provide guidance for the actor network to select the optimal action, if the difference between the Q value solved by the critic network and the target value function is huge, the action learned by the actor network will be dispersed and the critic network will be unstable in the learning of the value function.Therefore, in the parameter updating process, the parameters of both the target actor network and the target critic network are updated after a period of training, which updates their relevant parameters through soft updating as Eqs 57-58 (Li et al., 2023).
where ε is the soft update factor, which has a value much less than 1.
The detailed flowchart of the MADRL algorithm proposed in this paper is shown in Figure 3.

Case study
In this session, the parameters of the IEGS and the proposed algorithm are firstly described, followed by an

Approaches
PSO MADDPG MATD3 Proposed approach example to assess the effectiveness and superiority of the proposed approach.

Case study setup
In order to effectively evaluate the performance of the proposed scheme, the integrated PJM-5 node system (Li et al., 2017) and seven nodes natural gas system (Li et al., 2008) are selected for experimental analysis, where P2G is connected to the WT at PJM-5 node E, which decreases the wind abandonment rate of the WT and the P2G simultaneously achieves the CH4 transmission through node three of the gas system.GT achieves the conversion between gas and electricity by connecting to the PJM-5 node D and the gas system node 6.The specific system architecture is shown in Figure 4.The electricity price is divided into three different prices as shown in Table 1, where the electricity price for 0:00-8:00 and 22: 00-24:00 is $105.06/MW,for 8:00-12:00 and 18:00-22:00 is $130.36/MWand for the remaining hours is $177.24/MW.The price of gas sold from the two wells is 78.39$/MW.The price for purchasing carbon credits in the carbon market is 15$/ton.Detailed parametric data of IEGS can be found in (Li et al., 2023).Parameters of the proposed approach are shown in Table 2.

Evaluation of the proposed control model
In order to verify the effectiveness and advancement of the proposed model, the following methods are selected for comparative analysis.The reward variation of the proposed method during the training process is shown in Figure 5, where the performance of the Optimization results for gas system.Comparison of carbon emission.

FIGURE 10
Comparison of wind power consumption.
Frontiers in Energy Research frontiersin.org11 Feng et al. 10.3389/fenrg.2024.1428624proposed method is evaluated by the variation of the reward value.
Since the parameters of the action neural network are randomly initialized at the beginning of the training process, the agent is unaware of how to make decisions to reduce the total operating cost.Therefore, the agent chooses to explore the environment to gain more experience.The experience gained from the pre-training is stored in the experience pool to optimize the control strategy with the experience replay mechanism.Each iteration step samples a certain amount of historical training data from the experience pool for updated parameters of the action and critic neural network.As can be seen from the figure, the cumulative reward earned by the agent gradually increases during the process.At the end of 100 stochastic optimizations, the reward rises rapidly, and the curve starts to converge when the training reaches about 1,000 times.After several training sessions, the proposed model has acquired the ability to cogitate for optimal decision making in new environments.One particular day of data is chosen for the validation analysis, with specific information displayed in Figure 6.As can be seen, the power of the WT is higher in the early hours of the morning, while the electrical loads are smaller, leaving the system struggling to completely dissipate all the wind power.During the period 5:00-24:00, the power of WT has been less than the electrical load.
In this paper, three algorithms, particle swarm optimization (PSO) (Du et al., 2023), multi-agent deep deterministic policy gradient (MADDPG) (Abid et al., 2024), and multi-agent twin delayed deep deterministic policy gradients (MATD3) (Wang et al., 2023b) are selected for comparative validation as a way to verify the reliability and robustness of the proposed approach.The specific results of the proposed method with other comparison schemes on the test set are shown in Table 3. PSO performs the worst in the face of complex IEGS such that the optimal scheduling scheme is not obtained.Compared to the PSO, MADDPG selects multiple agents for optimal scheduling.Coordinated management between energy sources is achieved using communication between agents to obtain better performance.MATD3 adds two sub-networks agents for Q-value estimation to solve the problem due to Q-value overestimation, which is a further improvement compared to MADDPG.The proposed scheme adopts MASAC as the kernel, which expands the stochasticity of the scheduling process by increasing the entropy function, to obtain the optimal scheduling strategy.The proposed approach represents a total cost reduction of 35,670.66$and a reduction in carbon emission of 173.52 tons compared to PSO.
The detailed outputs of the electrical system and the gas system in the proposed method on a particular day of the test set are presented in Figures 7, 8 respectively.From Figure 7, it can be observed that during the period 0:00-8:00, due to the lower electricity price and higher wind power output, P2G consumes more electricity for CH4 production, which reduces the CO2 release from the system while consuming wind power, and the GT is almost inactive at this time.During this period the BESS starts charging and the CCS is capturing carbon to consume electricity.During 8:00-12:00, as the price of electricity increases, the power consumed by the P2G and CCS starts to decrease, the GT gradually starts to work, and the BESS releases the stored power.During the period 12:00-18:00, when the tariff reaches its maximum value, the GT reaches its maximum power to achieve gas to power conversion, which reduces the cost of purchasing electricity for the system.It can also be seen from Figure 8 that the proposed model chooses to increase the power from electricity to gas when the electricity price is low.In contrast, when the electricity price is high, the proposed model chooses to increase the power of gas-to-electricity conversion.

Evaluation of the proposed GCCP model
In order to verify the validity of the GCCP model, this paper constructs four scenarios for simulation analysis.The economic scheduling strategy in scenario one does not consider CCS and P2G.The economic scheduling strategy in scenario two considers only CCS.The economic scheduling strategy in scenario three considers mainly P2G.The economic scheduling strategy in scenario four introduces the GCCP proposed in this paper.
The comparison of CO2 emissions under different scenarios is displayed in Figure 9. Comparing the carbon emissions of scenario one and scenario 2, it can be observed that the carbon capture device Can significantly reduce the CO 2 emissions of the IEGS, which is about 53.4% of the total emissions, with a total of 1,210 tons of CO 2 reduced in scenario two compared with scenario 1.By comparing scenario one and scenario 3, the CO 2 emissions of the system are almost unchanged because the CO 2 demanded for P2G is purchased from an external source.The higher carbon emissions in scenario one than in scenario three in the 0:00-6:00 interval are due to the conversion of excess wind energy achieved through P2G, which reduces the amount of gas purchased online to reduce carbon emissions.When the GCCP model was introduced in scenario 4, the carbon emissions at each hour are significantly reduced compared to the other scenarios, with a reduction of 1,476 tons compared to scenario 1.
The wind power output under different scenarios is shown in Figure 10.Comparing scenario 2 with scenario 1, which is the peak period of wind power output from 1:00 to 5:00, it can be observed that there is a partial improvement in wind power consumption after the use of the CCS device.Comparing scenario one and scenario 3, it is clear that the P2G device can significantly increase the wind power output, and the wind power generated in scenario three does not achieve the maximum value due to the maximum input power of the device.In scenario 4, the GCCP coupling device is adopted to significantly enhance the wind power consumption, which reaches 92.81%.
The voltage values of the proposed approach at each moment of each node on a certain day of the test set are shown in Figure 11, from Voltage profiles of power system.

Frontiers in Energy Research
frontiersin.orgwhich it can be seen that although the voltage values of all nodes before and after the moment have a large volatility, the voltage has been located in the range of [0.96, 1.04].It conforms to the requirement of stability and security operation of the system, which again proves the effectiveness of the proposed approach in voltage control.

Conclusion
In this study, a MADRL-based IEGS scheduling approach considering GCCP with simultaneous consideration of system security and economy is proposed.MADRL replicates historical data to address the negative impacts caused by time series data with efficient exploratory techniques for seeking optimization.Agents seek optimal control strategies by continuously interacting with each other with information.The detailed conclusions of the study are summarized as follows: (1) Compared to several other methods, the proposed controlling framework and approach provide the best performance.(2) The explainability of the properties in detail in the IEGS provides additional evidence of validation for the proposed controlling framework.(3) The effectiveness of the proposed GCCP model is verified through four different scenarios, reducing carbon emissions by 1,476 tons and increasing the proportion of wind power consumption by 4.41% compared to Scenario 1.

FIGURE 11
FIGURE 11 grid,i,t , ϕ grid,i,t , P CCS,i,t , P Wind,i,t , P GT,i,t , SOC ESS,i,t , V grid,i,t , M grid,t P E Gas,a,t , E P2G,a,t , M gas,t(43)where P grid,i,t and ϕ grid,i,t indicate the active and reactive power demanded by the load at grid node i at time t, respectively; P Wind,i,t denotes the active power injected into node i at time t by WT; SOC ESS,i,t indicates the capacity ratio of the ESS at node i at time t; V grid,i,t denotes the voltage value of grid at node i at time t; M grid,t and M gas,t are the price of grid and gas, respectively.•Action: The action ensemble A t a

TABLE 1
Electricity price.

TABLE 3
Comparison results of various approaches on the test day.