Stackelberg game-theoretic model for low carbon energy market scheduling

: Excessive carbon emissions have posed a threat to sustainable development. An appropriate market-based low carbon policy becomes the essence of regulating strategy for reducing carbon emissions in the energy sector. This study proposes a Stackelberg game-theoretic model to determine an optimal low carbon policy design in energy market. To encourage fuel switching to low-carbon generating sources, the effects of varying carbon price on generator's profit are evaluated. Meanwhile, to reduce carbon emissions caused by energy consumption, carbon tracing and billing incentive methods for consumers are proposed. The efficiency of low carbon policy is ensured through maximising social welfare and the overall carbon reductions from economic and environmental perspectives. A bi-level multiobjective optimisation immune algorithm is designed to dynamically find optimal policy decisions in the leader level, and optimal generation and consumption decisions in the followers level. Case studies demonstrate that the designed model leads to better carbon mitigation and social welfare in the energy market. The proposed methodology can save up to 26.41% of carbon emissions from the consumption side for the UK power sector and promote 31.45% of more electricity generation from renewable energy sources.


Introduction
The global population will increase from 7 billion to 9 billion over the next 40 years, with energy demand rising by 50% [1]. In the energy sector, fossil fuels (including coal, gas, and oil) supply around 80% of total energy demand [2]. This will still continue for decades until renewable energy sources become the main energy supplies. During the combustion of fossil fuels, enormous quantities of carbon dioxide are emitted, which leads to global warming and irreversible effects of climate change [3]. Facing these climate issues, low carbon policy encourages generators to switch fuels to renewable energy sources through increasing carbon cost of fossil fuel-based power plants. Meanwhile, low carbon policy needs to incentivise consumers to be aware of emission differences caused by various consumption patterns and regions. With smart grid technologies, generators and consumers are able to bidirectionally communicate with energy markets [4], such that policy-makers can adapt policies of energy market scheduling to achieve low carbon targets.
The carbon price, as a market-based climate policy, is a primary economic instrument to address carbon emissions caused by fossil fuel-based generation in the energy sector [5]. However, an inappropriate carbon price would inefficiently deliver low carbon transition and emission targets, which is a challenge of most current low carbon policies [6]. If the carbon price lies below the estimates of social cost of carbon or the rates at which reduction target will be realised, such as carbon price of EU emission trading scheme, it fails to incentivise generators for low carbon transition to renewable energy sources; if the carbon price is set too high, the gap between high price entities and entities with low or without carbon price would harm business competitiveness. The price gap also results in a carbon leakage issue, which means that entities will discharge emissions in the region with low carbon price but overall emissions are not reduced [7]. To manage these inappropriate carbon price issues, carbon price floor (CPF) and ceiling are implemented in the current international carbon market [8]. The UK CPF is set to compensate low carbon price of EU emission trading scheme since 2013 [9]. New Zealand creates a price ceiling through fixed price option [10]. The US regional greenhouse gas initiative creates a carbon price corridor by setting price floor and ceiling [11]. Nonetheless, these international experiences have so far produced an insufficient carbon pricing scheme to adequately incentivise low carbon energy transition. For instance, after introducing CPF in the UK energy market, the carbon price has been frozen at its floor price since 2015 and thus renewable transition slows down [12]. One of the prime reasons of this frozen carbon price is that the responding information from energy market participants, i.e. generators and consumers, is not dynamically grasped to adopt low carbon policy, and the consumers are not involved in the policy design. This provides opportunities for using game-theoretic strategy to design a dynamic carbon price scheme to optimally interact between the policymaker and energy market participants, in particular energy consumers.
The low carbon energy market scheduling refers to strategically dispatching energy sources for generating electricity with the objectives of carbon mitigation, costs saving, and electricity bills reduction. The game-theoretic strategy is gaining increasing attention as an analysis tool for scheduling and modelling interactions in energy markets. Cournot and Stackelberg are two classic models of game theory for energy market scheduling. Cournot model describes that players make decisions independently and simultaneously [13]. By contrast, Stackelberg game-theoretic model features a hierarchical two-level decisionmaking approach [14]. These two levels are leader level and follower level according to the action sequence of players. The leader makes decisions first and has an idea of actions for each follower who makes subsequent decisions responding to the leader's strategies [15]. During low carbon energy market scheduling, policy-makers, including regulators and system operators, formulate low carbon policy prioritising market changes or responses from generators and consumers, so as to effectively yield the largest social welfare [16]. This forms a sequential action structure and therefore motivates us to utilise the Stackelberg game-theoretic model. This model can seize heterogeneous interactions of participants in low carbon energy market, and evaluate the effectiveness of policy scheme by grasping responding information from both generators and consumers.
With respect to related works of game-theoretic strategy, Cintuglu et al. [17] designed a game-theory method to model the operating cost, pollutant emissions, and cost of power exchange for generators. Further works [18][19][20] extend the game-theoretic power system scheduling to involve the role of consumers. The objective of consumers is to minimise their electricity payment bills by managing them as a single unit [18], through aggregators [20] or controlling appliances [19]. The research in [18] uses carbon reduction as a shared objective function of generators and consumers, but practically generators pay more attention to the rise of cost due to purchasing carbon price and consumers preferably adapt their consumption patterns to lower electricity bills. For this reason, the game-theoretic interactions between generators and consumers may not sufficiently deliver the low carbon target. The role of the policy-maker is equally important for low carbon energy market scheduling through policy incentives. A recent study for carbon taxes design in the electricity sector [21] considers interactions between the policy-maker and generators. Minimum tax rates are decided to effectively achieve a carbon target. There are still great opportunities in considering energy market participants including generators, consumers, and policy-makers together and applying policy measurement such as social welfare to guarantee energy market efficiency.
Moreover, the carbon emissions incurred by consumption patterns and locations may vary, which is a primary driver for carbon emissions in the generation side. From policy-maker's perspective, carbon accounting in consumers' level is necessary for formulating appropriate monetary incentives for specific group of consumers to change their consumption for carbon reduction. This monetary compensation comes from the carbon tax received from generators to realise tax neutralising. The method of carbon accounting for consumers is explored in [22] as a concept of carbon emission flow (CEF). The CEF is applied in power systems as a network flow accompanying power flow in [23]. A similar method is extended to a mathematical model in [24] for tracing, calculating, and distributing onto each of consumers to analyse carbon emission differences caused by consumption behaviours. The matrix calculation of CEF is based on a known distribution of optimal power flow. For the traditional optimal power flow which is determined by the objective of minimising operating cost of generators as proposed in [24], the CEF caused by consumers' consumption patterns is well evaluated. However, the performance of the CEF model may degrade when considering low carbon scheduling of consumers, because both the power flow and CEF of consumers are considered as decision variables, instead of only mitigating carbon emissions with power flow distribution unchanged. In addition, from a policy-maker's perspective, how to use carbon tax to incentivise consumers' low carbon energy consumption behaviours is missing in existing studies. This paper approaches low carbon energy market scheduling to solve the aforementioned issues considering several gaps in existing studies. Instead of finding minimum tax rates in existing studies, we design an optimal carbon price and compensation scheme to fairly abate carbon emissions as much as below the emission targets, whilst guaranteeing the maximisation of social welfare brought by low carbon policy. From generators' perspective, rather than focusing on fundamental costs (operation, maintenance, and other financial costs), we couple emission trading market with energy market by involving carbon cost. This encourages decision making by generators between fuel switching and purchasing carbon price. From the consumers' perspective, although the carbon cost is charged to generators, they will pass some of these carbon costs on as increased electricity bills for consumers. However, consumers in high carbon intensity regions and time periods are not incentivised. Our research adjusts previous low carbon policy with compensation to accommodate the demand side participation. Compared with the existing works, this paper's contributions are • This paper proposes a novel low carbon policy design by dynamically setting optimal carbon prices and compensation rates to combat the aforementioned issues brought by inappropriate carbon prices. In contrast to existing studies on carbon tax design which minimise tax rates while maintaining carbon emissions below the targets, we use maximisation of social welfare and carbon reduction to realise a trade-off between market efficiency and carbon abatement. • This paper extends existing low carbon power systems scheduling model by involving the effects of low carbon policy on consumers' payment bills and carbon cost of generators. This extension contributes to enforce fuel-switching of generators and demand side carbon abatement of consumers in specific regions and time periods. The overall CEF and power flow are scheduled as consumers and generators' decisions. This is different from existing studies to calculate CEF based on known optimal power flow.  The remainder of this paper is structured as follows. Section 2 introduces the low carbon energy market scheduling framework and discusses the CEF model. The framework of the Stackelberg game-theoretic model is described in Section 3. Section 4 describes the solution techniques for Stackelberg game-theoretic scheduling. Section 5 provides case studies to demonstrate the proposed model. Finally, Section 6 draws the conclusions.

System model
In this section, the overall framework of low carbon energy market scheduling is illustrated and the CEF method is then discussed as a preliminary of Stackelberg game-theoretic model.

Low carbon energy market scheduling framework
The overview framework of low carbon energy market scheduling is presented in Fig. 1. The electricity generation sources consist of coal, nuclear, wind, combined cycle gas turbine (CCGT), hydro, oil, open cycle gas turbine (OCGT), solar, and bioenergy. The consumers are residential, commercial, and industrial users. The proposed Stackelberg game-theoretic model primarily performs two functions: (i) Before announcing finalised low carbon policy, the proposed model provides an efficient tool for the policy-makers to evaluate the potential policy impacts on generators and consumers to obtain the optimal carbon price and monetary compensation rates for carbon reduction; (ii) After announcing low carbon policy, the proposed model provides an ancillary service for the policy-makers to communicate with generators and consumers through smart grid communication infrastructure, such that the real-time decision variables including carbon prices, monetary compensation, power generation and consumption can be optimally decided.
The balancing and settlement code central services are an existing IT system in the UK energy market as shown in the shaded grey parts in Fig. 1. For the second function, the proposed model is compatible with existing systems for transmitting additional information of real-time CEF, carbon price, and monetary compensation rate. The policy-makers including system operator and regulator announce electricity price, carbon price, and monetary compensation rate through the settlement administration agent and distribute this information through communication lines. The information of carbon emission caused by consumption, power demand, electricity payment bill, and monetary compensation rate for carbon reduction is bidirectionally transmitted between the policy-maker and the consumer through home area networks and neighbourhood area networks. Meanwhile, the information of carbon emission caused by generation, power supply, generating profit, and carbon price is bidirectionally transmitted between policy-maker and generators. The data from generators and consumers is collected by the central data collection agent and the overall generation and consumption are aggregated by the supplier volume allocation agent. After data collection and aggregation, our proposed Stackelberg game-theoretic model conducts low carbon energy market scheduling through capturing the interactions among policy-maker, generators, and consumers. Specifically, generators and consumers in the followers level receive carbon price and monetary compensation, respectively, from the policy-maker at the leader level. Subsequently, generators seek to maximise their profits considering carbon costs, while consumers seek to minimise their payment bills with monetary incentives for carbon abatement in high emission regions and time periods. The decisions of generators (power generation) and consumers (power consumption) are transmitted to the policy-makers as responding strategies. The policy-makers dynamically adjust the carbon price and monetary compensation rates by maximising social welfare and carbon reduction.

Carbon emission flow
The goal of CEF tracing is to analyse potential differences in carbon emissions caused by various electricity consumption patterns and locations, as well as generation sources, before formulating an efficient policy measure of low carbon energy market scheduling for the overall power systems through renewable energy utilisation and demand side management. To trace the complete carbon footprint of power systems, a virtual concept of CEF proposed by Li et al. [23] and Kang et al. [24] is considered. In our paper, this model is extended to involve electricity generation by major sources in specific power plant and obtain optimal CEF distribution using Stackelberg game-theoretic model considering policy incentivised fuel switching and demand side management. We will first briefly discuss Kang's CEF model, and the reader can refer to [24] for further details.
In power systems, power plants are represented by outflow buses, substations with consumers are represented by inflow buses, and transmission networks are represented by branches. The CEF indicates a virtual network flow to trace the carbon footprint flowing through power systems. Concurrent with the power flow, the CEF is generated at outflow buses, before transmitted along the branches into inflow buses. The CEF is implemented in overall power networks and accumulated at consumer-side to quantify carbon emissions through abstracting network features of transmission branches. Hence, the carbon emission responsibility for transmission and consumption sides can be fairly allocated, instead of simply attributing emission responsibility to the generation side. Although transmission and consumption are unable to directly produce carbon emissions in reality, the power generation and corresponding carbon emissions are primarily driven by satisfying the demand of consumption side. To evaluate the distribution and movement of CEF, two metrics are defined in [24] (i) CEF Rate: The CEF rate can be defined as the amount of CEF in a point of power networks per unit of time with a unit of tCO 2 where R is the CEF rate, F is the CEF, and t is the time slot which is defined as each hour of scheduling horizon in this paper.

ii) CEF intensity:
The CEF intensity is defined as the amount of CEF for a specific point of branch or bus per unit of active power flow. It is used to describe the relationship between CEF and power flow in power networks with a unit of tCO 2 /MWh where e is the CEF intensity, E is the electrical energy with the unit of MWh, and P is the active power with the unit of MW. Consider a smart power grid and let N with size N denote the set of outflow buses representing power plant, indexed by the integer g. Each outflow bus g ∈ N consists of single or various generation sources. Define ℳ with size M as the set of inflow buses representing consumers, indexed by the integer k.

CEF of generators:
The CEF of generators quantifies the portion ejected from outflow buses into branches. This carbon emission is produced by the combustion of fossil fuels. According to (2), the CEF rate of gth power plant is the product of carbon intensity and active power output where R g , P g , and e g are the CEF rate, active power, and CEF intensity of gth power plant, respectively. In our research, in order to incentivise the internal carbon abatement of power plants through fuel switching, we extend the CEF of generators to consider the electricity generation by major sources. For a power plant with a single generation source, its CEF intensity is determined by the carbon emission factor of fuel and fuel consumption rate [25]. For a multi-source power plant, the CEF intensity of outflow bus is determined by all the sources at this node where N u is the number of sources in the gth power plant, P g, u and e g, u are active power output and CEF intensity of uth generation source in gth power plant.

CEF of branches:
The CEF of branches is a mix of CEF from various outflow buses. The CEF intensity is the same across any particular branch cross section. Due to the topology structure of power networks, the power flow distribution and the relationship between power inflow and outflow are based on proportional sharing principle [26]. Relating this principle to the CEF tracing, the distribution of how much CEF from a particular outflow bus goes into a branch, and subsequently into a particular inflow bus can be described. According to the proportional sharing principle, the bus receives CEF from several sources in a given proportion and distributes this CEF to each outflow in the same proportion. Fig. 2. takes the bus z as an example. Z + and Z − are the inflow and outflow sets of branches of the bus z. The inflow power on the ith branch and outflow power on the jth branch are P i and P j , respectively. The generator active power output is P g . Besides, power consumption in bus z and power loss of branch can be taken as another outflow branch. Hence, we can use the proportional sharing principle to describe the relationships between P i , R i and P j , R j . Define P i, j and P g, j as shares of the power flow in the jth branch which come from ith inflow and generator, respectively, we have The CEF rate in the jth outflow branch R j is the sum of the CEF rate in the inflow set of branches and generator where e i and e g are CEF intensities of ith branch and generator, respectively. Using (5) and (6), the CEF intensity e j of jth branch is

Carbon emission loss of branches:
The carbon emission loss of branches is incurred by transmission loss. The additional power needs to be generated to balance transmission loss, which causes additional carbon emissions. The carbon emission loss of a specific branch is equal to the difference between inflow and outflow of CEF at the end of branch. Since the carbon emission loss of a branch can be taken as a load on this branch, the intensity of emission loss is the same as the CEF intensity of branch e l = e j . The carbon emission loss rate of lth branch is where R l , P l , and e l are the carbon emission loss rate, power loss, and carbon emission loss intensity of lth (l = 1, …, L) branch, respectively.

CEF of consumers:
The CEF of consumers is delivered from branches into inflow buses and caused by electricity consumption. Similarly, the CEF rate of the kth consumer is where R k , P k , and e k are the CEF rate, power consumption, and CEF intensity of kth consumer, respectively. Since the active power load is represented by an outflow branch, (8) can be also applied to the CEF intensity of customers It can be observed from (11) that the CEF intensity of customers is determined by all inflows from generators and branches. For this reason, the CEF intensity of customers holds the key to determine CEF intensities in overall power networks. Therefore, the minimisation of CEF intensity of customers contributes to the reduction of carbon emissions for the whole power system. In addition, within a power network, both power flow and CEF are conserved, which means that the total inflow and outflow of power flow and CEF maintain a balance at any given time period Therefore, the CEF relationship can be incorporated into the Stackelberg game-theoretic scheduling model to find the optimal CEF distribution. Unlike the Stackelberg game-theoretic framework in [18,21], which only considers the carbon emissions caused by power generation, the proposed model provides a fair low carbon energy market scheduling to involve the roles of transmission and consumption.

Game theory framework
This section describes a mathematical representation of the Stackelberg game-theoretic model for low carbon energy market scheduling.

Objective of consumers
The objective of consumers is to minimise their payment bills considering carbon reduction and the decision variable is their electricity demand. With the CEF model, the carbon emissions caused by time-varying and region-varying consumption behaviours can be traced. From previous research, the marginal carbon emissions of varying consumption behaviours are determined by generation mix and market systems [27,28]. Thus, the goal is to design a monetary compensation scheme α k to reduce the carbon emissions caused by time periods and regions with higher carbon emission rates and consider this monetary compensation as a manner of payment bills compensation.
To study consumers' responding strategies, the Stackelberg game-theoretic model for consumption scheduling [18,19] is considered. In our paper, this model is extended to involve timevarying and region-varying features of carbon emissions and reduce the emissions in those time periods and regions with high carbon intensities. We will first discuss the common payment bill minimisation model in [18][19][20]. For analysing the responding strategies of consumers and minimising overall payment bills of them, the consumers are considered as an aggregated unit during the optimisation process, which is the focus of [18]. The objective of consumers is to minimise their payment bills as where π is the electricity price during each time slot t. Let us consider the monetary compensation for carbon reduction of consumers. When the CEF rate of consumption after responding to leader's strategy R k is higher than or equal to that before responding to leader's strategy R k 0 , consumers will not receive any monetary compensation. In contrast, when the carbon emissions of consumption after responding to leader's strategy R k is less than that before responding to leader's strategy R k 0 , consumers will receive monetary compensation from the policy-maker. The monetary compensation for carbon reduction at high CEF rate level is higher than that at low CEF rate level. Hence, the relationship between the monetary compensation and the CEF rate change can be described as where M k is the received monetary compensation of consumer k for carbon reduction, α k is the monetary compensation rate of consumer k, and R k 0 , R k are the CEF rate of the consumer k before and after deploying monetary compensation strategy, respectively. 34 IET Smart Grid, 2020, Vol. Proof: the carbon reduction at high CEF rate level will receive more monetary compensation, which means that the monetary compensation rate α k = dM k /dR k linearly increases with the amount of CEF rate reduction. The second-order derivative of (15) (when R k 0 > R k ) with respective to R k should be positive d 2 M k /dR k 2 > 0. Thus, (15) implies the aforementioned monetary compensation strategy.□ Therefore, the payment bill optimisation problem for the consumption side is modelled as Objective I: min. payment bills of consumers s.t.
where P max is the maximum capacity of power networks.

Objective of generators
The objective of generators is to maximise their profits which can be described as the total revenue subtracting operating costs. The decision variables of generators are electricity generation by major sources. For Stackelberg game-theoretic model of generation scheduling in [18], the carbon reduction is taken as a shared objective function of both generators and consumers, but the cost of carbon emissions is not included in the generating costs. Nonetheless, in a practical case, instead of seeking for carbon reduction, generators pay more attention to the increase in cost due to purchasing carbon price. In order to involve the cost of purchasing carbon price for generators under emissions trading scheme, the concept of clean spread is introduced [29] in our model. This concept quantifies the profits of power sources from selling a unit of electricity and adjusts the fundamental costs (operation, maintenance, and other financial costs) by setting aside carbon cost. In this concept, the clean dark spread refers to the profit for coal-fired power plant and the clean spark spread refers to the profit for gas-fired power plant. The climate spread describes the profit difference between dark spread and clean spark spread [30]. Rather than only considering the emission difference between coal and gas, in our model, this concept is extended into electricity generation by all the renewable and non-renewable sources. For the clean renewable spread For the clean non-renewable spread where S r and S n are clean renewable and non-renewable spreads, respectively, N re and N nre are number of renewable and nonrenewable sources, respectively, β is the carbon price per ton of carbon emissions and C g, u is a fundamental cost of source u in gth power plant and is evaluated by the levelised cost of electricity (LCoE) generation, which is a measurement standard by the UK. Department for Business, Energy and Industrial Strategy [31]. The LCoE for a specific source is the ratio of the total costs of a source (including capital and operating costs) to the total amount of electricity generation over the entire lifetime of this source. When compared to today's costs and generation, future values are discounted. Additionally, wider costs which partly fall to others such as system balancing cost and carbon cost are not included in this research, because LCoE only relates to the costs accruing to the owner of the generation asset. (1 + ζ) a . (20) where Nc g, u is net present value of the expected cost of generation source u at gth power plant, a is accounting year, capex g, u a is capital expenditures at the year a, opex g, u a is the operating expense at year a, Np g, u is net present value of expected electricity generation by source u, P g, u a is net electricity generation at year a, and ζ is the discount rate. The profits optimisation problem for generators is modelled as Objective II: max. profits of generators where P g, u min and P g, u max are the minimum and maximum power outputs of the source u at gth power plant. In addition, the conservations of power flow and CEF (12) and (13) hold as the generators' constraints. Therefore, when considering the fundamental cost of generation plus the cost of compliance with carbon policy, the internal carbon abatement will be caused by electricity generation fuel switching. In our simulations, the proposed objective of generators is compared to the objective in [18] which minimises the fundamental costs of generators.

Objective of policy-maker
In contrast to low carbon power system scheduling in [21], in which the policy-maker seeks to minimise the tax rate (min β) while maintaining that the total carbon emissions are lower than the emissions target, our research introduces the social welfare as a measurement of low carbon policy. This is because the true target of low carbon policy is using market mechanism to deliver an efficient level of carbon reduction, in which the carbon emission is abated as much as possible without detrimental effects on energy market efficiency such as energy price imbalance [32]. Thus, one of the policy-maker's objectives is to maximise social welfare brought by low carbon policy. Another objective is to maximise carbon reduction. The decision variables of policy-makers are the carbon price and the monetary compensation rates.

Maximisation of social welfare:
The social welfare in this paper is defined as the difference between benefits and costs brought by low carbon policy. The economic benefits of low carbon policy are the improvement of market surplus plus policy maker's revenue through emission trading as (23). The market surplus consists of the gain of generators' profits and the saving of consumers' payment bills. The benefit of policy-maker's revenue is created by revenue neutrality through investing in low carbon technologies or compensate for other sector of the economy.
where B is the economic benefits of low carbon policy, P k 0 is the power consumption of consumer k before responding to low carbon policy and P g, u 0 is the power output of producer u in gth generator before responding to low carbon policy. The first term corresponds to the improvement of overall profits of generators, the second term corresponds to the reduction of overall payment bills of consumers, and the third term corresponds to policy-maker's revenue through emission trading.
The cost of low carbon policy is the monetary compensation for . Therefore, the social welfare maximisation problem for policy-maker can be described as Objective III: max. social welfare where β max and α max are the maximum levels of the carbon price and monetary compensation rate, respectively. Equations (25) and (26) corresponds to the leader's strategy space for carbon price and monetary compensation, respectively.

Maximisation of carbon reduction:
The second objective of the policy-maker is to mitigate the total carbon emissions for the purpose of low carbon energy market development through following measures: (i) reduce the carbon emissions caused by electricity generation; (ii) facilitate the renewable obligation to be performed by generators through increasing the penetration of renewable power sources. From the first perspective, the objective of policy-maker is to promote the reduction of overall CEF rate for power systems as Objective IV: max. carbon reduction From the second perspective, the policy-maker regulates the percentage of renewable power sources penetration as a constraint of power generation dispatch where γ is the minimum percentage of renewable energy penetration regulated by the policy-maker.

Problem formulation
The interaction among policy-maker, generators, and consumers are modelled as a 1 -leader, 2 -follower Stackelberg game. The leader's strategies are the carbon price β and monetary compensation rate for kth consumer α k within the strategy spaces (25) and (26), respectively. By contrast, the strategy of followers representing consumers is electricity demand P k within strategy space (17) and the strategy of followers representing generators is electricity generation by major sources P g, u within strategy space (12), (13) and (22). The leader's objective function of maximisation of social welfare (24) is represented by J L1 β, α k , P k , P g, u . The leader's objective function of maximisation of carbon reduction (27) is represented by J L2 P k , which is the leader's selfish objective function within constraint of (28). The followers' objective functions for consumers (16) and generators (21) are represented as J K α k , P k and J G β, P g, u , respectively. Steps to solve this Stackelberg game-theoretic model are illustrated as Step 1: Given leader's strategies β and α k within the strategy spaces (25) and (26), the followers of consumers and generators try to optimise their own objective functions within their strategy spaces (12), (13), (17) and (22) to obtain optimal responding strategies (i.e. optimal electricity demand P k * and optimal electricity generation by major sources P g, u * ), respectively. After examining every leader's feasible strategy, the followers' optimal responding strategies P k * and P g, u * form the sets of optimal responding strategies U K and U G , respectively.
Step 2: Based on each of the identified optimal responding strategies in the sets U G and U K of generators and consumers, the leader tries to optimise its objective functions (24) and (27). After examining followers' responding strategies, the leader's optimal strategies β * and α k * form the set of leader's optimal strategies U L .
Step 3: The set of leader's optimal strategies U L is taken as an updated leader's strategies for the followers to optimise their own objective functions. Through iterations between leader level and followers level, the convergence is realised as Nash equilibrium of Stackelberg game-theoretic model.

Benchmark forecasting
An optimal policy-making process is built on a precise forecasting of current power generation and consumption. Aforementioned P k 0 , P g, u 0 need to be accurately predicted as benchmarks for low carbon energy market scheduling to avoid the deviation of optimal policy from realistic situations. R k 0 can then be calculated based on P k 0 . In power supply side, the prices of coal, smokeless fuels, and heating oils are considered as impact factors of P g, u 0 . In power demand side, average temperatures and electricity prices are considered as impact factors of P k 0 . The relationship between forecasting objectives and impact factors is described through linear regressive functions in generation and consumption sides, respectively [33]. Denote x and y as correlated random variables representing impact factors and forecasting objectives, respectively, such that their dependency is expressed as where y(t) is the value of forecasting objectives at time t, x u (t) is the value of uth impact factor at the time t, and η u is the corresponding regression coefficient which is estimated through evaluating the minimum squared difference between the estimated value ŷ(t) and actual value y(t) as [34] min where ε is the estimation error between the estimated value and actual value. Furthermore, considering the stochastic fluctuation of impact factors, randomness and uncertainties need to be introduced in the forecasting process. Unlike the uncertainty model in [35] which employs parametric estimation to establish the distribution of uncertain variables, we use kernel density estimation [36] as a nonparametric estimation to estimate the probability density function 36 IET Smart Grid, 2020, Vol. (pdf) of uncertain impact factors. Kernel density estimation is capable of precise estimation because it is directly generated from historical observation without any assumption of parameters. Furthermore, the randomness of impact factors is introduced through Latin hypercube sampling [37]. Compared with the Monte Carlo method [38], the Latin hypercube sampling is able to avoid overconcentration issue through space-filling, which means that the samples are generated over entire feasible range of uncertain impact factors. Besides, the Monte Carlo method requires relatively longer computing time because of slow convergence.
x 1 , x 2 , …, x i represent ith sample of impact factors, and f (x) is corresponding density function for each sample. For sample x i , i = 1, …, n acquired from historical observation with an unknown density function, the density function is obtained through where f^(x) is the estimated kernel density function, n is the number of samples, h is the bandwidth smoothing parameter, and K( • ) is the kernel density function. In our research, the formula of the Gaussian kernel is used due to relatively high efficiency and simple mathematical presentation [39]. After combining kernel functions generated from a specific sample point, the kernel density function is acquired. The cumulative distribution function of x is represented by Φ(x). Assume the desired sample size is Q. The range Φ(x) is equally divided into Q intervals with the same probability of 1/Q. Subsequently, a value is randomly selected at each interval to generate samples. The qth (q = 1, …, Q) sample x can be calculated by Φ −1 , where Φ = 1/Q r n + q − 1 /Q , and r n ∼ N(0, 1) is a random variable being subjected to uniform distribution.

Algorithm
For the follower's level objective functions, due to the monetary compensation (15) is a piecewise function, the objective function of consumers becomes non-convex. Existing deterministic methods, such as gradient-based algorithms and non-linear programming, are liable to obtain sub-optimal solutions. This issue will be demonstrated in our case studies. Thus, this motivates us to utilise an intelligent algorithm to find the global optimal solutions, which exist in different dimensions of objective functions. In addition, objectives at leader level and followers level lead to their own multiobjective optimisation problem (MOP). Chiu et al. [40] proposed an MOIA to solve the MOP [40]. Nonetheless, the iteration optimisation between leader level and followers level cannot be modelled by original MOIA, which means that the responding strategies from followers cannot be evaluated by the leader in a feedback way. Therefore, we extend original MOIA to an iterative BL-MOIA to solve this MOP. The solution is obtained by using the leader-level BL-MOIA as shown in Algorithm 1 (see, Fig. 3) and follower-level BL-MOIA as shown in Algorithm 2 (see, Fig. 4). The flowchart of solving this Stackelberg game model is shown in Fig. 5. Calculations of fitness value and clone rate, and detailed algorithm refer to the work of Chiu et al. [40]. Furthermore, the Pareto optimality is introduced in order to discuss the characteristics of the algorithm Definition 3: (Pareto optimal set and Pareto front (PF)): All pareto optimal solutions form the pareto optimal set. The image of pareto optimal set through the objective functions f o is PF.

Case studies
Case studies have been conducted to demonstrate the performance of the proposed model based on the current UK power system and energy market operations at one-hour intervals. Benchmark forecasting on generators and consumers is performed through using the UK historical data from National Statistics [41] from 2015 to 2017 at one-hour intervals. The UK government announced in the 2017 Energy Review that the proportion of renewable penetration will increase each year until 2020 from a 15.4% requirement in 2015-2016, up to 20% by 2020-2021 [42]. This target is also applied in this study as a constraint of renewable percentage. The hourly wholesale electricity price in the UK national grid is used as electricity price [43]. According to the EU historic carbon price [44], the constraint of carbon price is set between £ 0/tCO 2 and £ 50/tCO 2 . The case studies are conducted by Matlab on an Intel 3.20 GHz processor.

Distribution of CEF and monetary compensation rates
Proposed Stackelberg game-theoretic model is applied with the IEEE 30-bus test system and the IEEE 118-bus test system to compare the CEF and monetary compensation rates caused by different network complexities. The IEEE 30-bus test system consists of six generators, 41 branches, and 21 buses carrying loads, while the IEEE 118-bus test system consists of 54 generators, 186 branches, and 99 buses carrying loads [45]. Each of six generators in the IEEE 30-bus test system is allocated as single or multiple types of energy sources depending on system default capacities, and each of 54 generators in IEEE 118-bus test system is allocated as a single type of electricity source in circular sequence, as presented in Table 1. The cost and carbon emission coefficients are calculated by the levelised values of project commissioning in 2016 [31].
The chromatograms in Fig. 6 (IEEE 30-bus test system) and Fig. 7 (IEEE 118-bus test system) present (aand d) the CEF of branches, (b and e) carbon emission loss of branches, and (c and f) CEF of consumers, (a, b and c) before and (d, e and f) after scheduling. The corresponding monetary compensation rates for (a) IEEE 30-bus test system and (b) IEEE 118-bus test system are presented in Fig. 8. Each column denotes the distribution of CEF and monetary compensation rates in overall power networks for a given hour. The dark blue colour represents a lower value whereas the bright yellow colour represents a higher value. As shown in Figs. 6 and 7, although the overall distribution at different power network topologies in each column varies, the trend of daily carbon distribution in each row is similar. It can be seen that through scheduling, extreme high CEF of consumers, branches, and carbon emission loss of branches at specific bus or branch can be mitigated. The scheduling presents a neutralised effects of carbon emission for overall power networks and time horizon. The monetary compensation rates also vary corresponding to CEF rates. The darkest blue colour means that monetary compensation rates are near zero, which is caused by either very low CEF rates or the CEF rates after scheduling are higher than or equal to the ones before scheduling, as indicated in (15).
After low carbon energy market scheduling, the daily overall CEF of consumers in IEEE 30-bus test system reduces by 25.47% (from 126.59 to 94.35 ktCO 2 ), while that in IEEE 118-bus test system reduces by 26.41% (from 137.34 to 101.07 ktCO 2 ). This illustrates the potential for consumption side carbon reduction of our model. During real-time energy market operation, if the information of CEF, power consumption, electricity payment bills and monetary compensation can be dynamically transmitted to consumers at specific bus through smart meters and smart grid communication, the carbon footprint incurred by the time-varying and region varying consumption behaviours would be captured. Consumers can subsequently adjust their behaviours according to our model-suggested optimal consumption.

Pareto optimal of leader and followers:
For demonstrating clarity, the following case studies use IEEE 30-bus test system only. The PF of the trade-off between generating profits of generators and electricity payment bills of consumers in followers level is presented in Fig. 9a. The PF of trade-off between social welfare of low carbon policy and carbon reduction effects in leader level is presented in Fig. 9b. We select one-hour optimisation as an example and compare the solution of BL-MOIA with original MOIA which optimises leader level and followers level objective functions simultaneously without iterations between leader and followers. It can be seen that the PF in followers level is in non-convexity. With the existing deterministic methods such as gradient-based algorithms, the sub-optimal solutions may be obtained. When the gradient-based algorithm falls at convex region B instead of global optimal region A as shown in Fig. 9a, a suboptimal solution will be found. Regarding both leader and followers' objective functions, the proposed BL-MOIA yields better performance than original MOIA through iterations between leader and followers regarding both leader and followers' PF. The convergence of BL-MOIA for (a) carbon price, (b) social welfare, and (c) carbon reduction are presented in Fig. 10. The BL-MOIA converges to optimal solutions within 50 iterations, although the iterations between leader level and followers level are involved.
Through involving randomness and uncertainties, the average MAPE of benchmark forecasting is 1.90%, lower than conventional linear regression (2.60%). An example of forecasting demand comparison in 150 h horizon is presented in Fig. 11.

Low carbon energy market scheduling
We first compare our proposed Stackelberg game-theoretic model with existing methods for energy market scheduling. At the followers level, our model is transformed to the existing model as [18,19] by removing carbon costs and monetary compensation (let β ⋅ e g, u = £0/MWh, and M k = £0/tCO 2 ). At the leader level, the objective functions of leader are replaced by constraint of emission target as [21]: P g, u ⋅ e g, u ≤ E max , where E max is the carbon emission target. We assume that the carbon reduction target is 5% of benchmark carbon emissions. Next, we compare our proposed model with the UK current carbon price from CPF through setting  Fig. 12. It can be seen that through either low carbon scheduling or CPF, daily carbon emissions are halved compared to benchmark. The carbon target set in existing method presents almost the same carbon mitigation effects as CPF to drive fuel switching from fossil fuel-based generations to renewables, notably from coal and CCGT to wind and hydro energy. The CPF and carbon target maintain a fixed carbon price or proportion of reduction such that the reduction effects keep the same irrespective peak time or off-peak time periods, while our proposed model strikes a more dramatic carbon mitigation in peak time period from 15 to 19 h, because it realises a higher proportion of renewable energy sources (average 31.14% compared to average 28.31% of existing methods and average 28.79% of CPF) during peak time. In addition, the comparison of daily total values of objective functions is presented in Table 2.
The proposed model obtains better results in all dimensions, notably creates £ 4.51 m of social welfare as additional benefits, compared to £ 2.91 m of existing method and £ 1.13 m of CPF. The scheduling also drives the payment bills down for consumers and improves operating profits for generators. The aforementioned climate spread within generators' objective will cause internal carbon reduction of generators through generation source switching. A power plant with non-renewable sources will either have to reduce emissions internally or increase cost of purchasing more carbon allowances. From environmental and economic perspectives, the proposed model guarantees the trade-off between market efficiency and carbon abatement. With respect to long-term effects of the proposed model, we compare the annual percentage of electricity generation by major sources for benchmark, proposed model, existing model, and the UK CPF as Fig. 13. The proposed model realises the highest proportion of renewable energy sources (31.45%), whereas the CPF (29.47%) fails to deliver the 30% of the UK carbon emission target in 2030 [5].

Conclusion
To mitigate carbon emissions and manage the inappropriate carbon price issue, this paper considers a low carbon policy design for energy market scheduling under smart grid environment. This design is different from current low carbon policies and energy market scheduling methods by proposing a novel Stackelberg game-theoretic model. The underlying idea is to use advanced information and communication infrastructures to involve carbon cost in generators' objective function and formulate monetary compensation for carbon reductions of consumers in specific region and time period. The rationality of low carbon policy is ensured through optimising overall economic and environmental effects brought by the policy. Hence, a fair low carbon energy market scheduling is realised through efficiently solving interactions between leader and followers of Stackelberg gametheoretic model by BL-MOIA. As illustrated by simulation, considering the carbon cost in energy dispatch, the fuel is switched from conventional sources to renewable energy sources. The proposed low carbon energy market scheduling model promotes 31.45% of electricity generation from renewable energy sources. The designed time and region-specific monetary compensation scheme eliminates extreme high CEF of consumers, branches, and carbon emission loss of branches, and contributes up to 26.41% carbon mitigation caused by consumption behaviours. The proposed model outperforms existing low carbon policy schemes and energy market scheduling models in both leader and followers' objective functions. Through the proposed model under the smart grid environment, the responsibilities of carbon reductions are allocated in overall energy market. In future work, the environmental revenue of carbon tax from generators needs to be reallocated in energy market, not only invest in low carbon technologies but also compensate for energy intensive industries due to the fear for competitiveness caused by carbon price. Thus, how to fairly distribute this environmental revenue in specific industry needs to be investigated.

Acknowledgments
This work has received funding from the H2020 'TESTBED' project under grant number No. 734325, and EPSRC 'TOPMOST' project under grant number EP/P005950/1.