Why electricity market models yield different results: Carbon pricing in a model-comparison experiment

Abstract The European electricity industry, the dominant sector of the world's largest cap-and-trade scheme, is one of the most-studied examples of carbon pricing. In particular, numerical models are often used to study the uncertain future development of carbon prices and emissions. While parameter uncertainty is often addressed through sensitivity analyses, the potential uncertainty of the models themselves remains unclear from existing single-model studies. This study investigates such model-related uncertainty by running a structured model comparison experiment, which exposes five numerical power sector models to aligned input parameters—finding stark model differences. At a carbon price of 27 EUR/t in 2030, the models estimate that European power sector emissions will decrease by 36–57% when compared to 2016. Most of this variation can be explained by the extent to which models consider the market-driven decommissioning of coal- and lignite-fired power plants. Higher carbon prices of 57 and 87 EUR/t yield a stronger decrease in carbon emissions, by 45–75% and 52–80%, respectively. The lower end of these ranges can be attributed to the short-term fuel switch captured by dispatch-only models. The higher reductions correspond to models that additionally consider market-based investment in renewables. By further studying cross-model variation in the remaining emissions at high carbon prices, the representation of combined heat and power is identified as another crucial driver of differences across model results.


Introduction
The EU ETS and the power sector. The European Union's Emission Trading System (EU ETS) is the oldest and largest international emissions trading scheme in operation to date [1]. It covers about 40% of EU carbon emissions, with the power industry being the largest contributing sector [2]. Accordingly, a wealth of literature has studied the EU ETS and the role of the power sector therein. These studies inform policy makers on design questions on the EU ETS, on complementing national climate policies, such as renewable energy support and coal phase-outs [3,4], and on expanding the scope of the EU ETS to, for example, the transport sector [5,6].
Models behind the studies. Many of these studies are based on numerical partial equilibrium models, which can be distinguished into three groups based on their scope and technical detail. The first group of studies assumes a linear marginal abatement cost curve for both the power sector and energyintensive industry [7][8][9][10]. The second group keeps abatement cost curves for the industry but explicitly considers investment and dispatch options in the power sector [11][12][13]. In these all-sector models, the modeling of the power sector is very simplified. For example, models are often based on a few representative days, and some ignore the option to store and internationally exchange electricity. The third set of studies focuses on the power sector without explicitly representing the carbon market. These studies assume either a sector-specific carbon cap on electricity-related emissions [14][15][16] or a fixed carbon price [17][18][19]. Their modeling of the power sector is more detailed than in the preceding sets of studies; for example, dispatch is typically modeled in hourly granularity during a complete year. Such power sector models are the focus of the following model experiment.

Variation in results.
Studies on the future development of the EU ETS and its impact on the power sector have provided a broad range of results, which implies that readers and, notably, policy makers are facing considerable uncertainty regarding future developments and policy impacts. For example, recent studies using similar models and scenarios have yielded a wide range of carbon prices between 30 and 87 EUR/t for 2030 [11,12]. 1 Part of this variation in results is related to scenarios and model input parameters, which many studies address through sensitivity analyses or through explicitly considering uncertain parameters in dynamic or stochastic programming, such as Monte-Carlo simulations [20,21]. However, as each model abstracts from reality, the divergence in results may also be related to the model formulation itself. For example, some existing power (and transport) sector studies have optimized dispatch and investment [5,6,14], while others have optimized dispatch only [15][16][17][18][19]. Moreover, some contributions have considered technical restrictions related to combined heat and power (CHP) provision [15,16], while others have refrained from modeling heat constraints [14]. How do such modeling choices affect results on carbon pricing? With existing studies relying on single models, this question has not been answered yet. Meanwhile, investigating the implications of varying modeling approaches for the assessment of carbon pricing seems informative for modelers who choose between the different approaches and for decision makers in energy policy and industry who interpret the results of different models.

Contribution.
We study this model-induced uncertainty by running a structured model comparison experiment, in which five numerical power sector models were exposed to aligned input parameters. All models optimize plant dispatch in time steps of hours to represent the partial equilibrium in the electricity market. Moreover, they all consider a relatively high degree of technical detail concerning thermal power plants (age-dependent efficiency, ramping, and cycling), renewable energy, hydroelectricity, electricity storage, cross-country interconnectors, system service provision, and the cogeneration of heat. In addition to dispatch decisions, some models also optimize investment and decommissioning decisions. While comparing multi-sector energy models has had a long tradition originating from the Energy Modeling Forum [22][23][24][25], this study is the first to compare power sector models in the context of the EU ETS.
Outline. The remainder of this paper is structured as follows. In Section 2, we introduce the theoretical framework and discuss the general effects of carbon pricing in the power sector. Section 3 describes the experimental setup, including the different models, the joint scenarios, and the evaluation approach. We subsequently present the results in Section 4 and conclude with a summary of the implications in Section 5.

Theoretical framework
Implications of carbon prices. Carbon prices affect power sector emissions via three different causal paths ( Figure 1). First, because the variable cost of fossil-fueled generators increases proportionately to their specific carbon emissions, technologies switch their position in the merit order of supply such that, for example, gas-fired plants with lower emissions replace hard-coal-and lignite-fired plants with higher emissions (short-term fuel switch). Second, an increase in the variable cost of fossil-fueled generators increases the electricity price and, with it, the market value (capture price) of renewables, which may incentivize market-based investment in renewables, substituting for fossil-fueled generation. Third, contribution margins of hard-coal-and lignite-fired plants are disproportionately reduced because of reduced utilization (due to the short-term fuel switch) and increased variable cost, which may lead to their decommissioning and the vanishing of related emissions. In addition to the decommissioning of existing coal-fired plants, reduced contribution margins will also cause investment in dispatchable generators to shift away from those using hard coal and lignite toward those using natural gas (long-term fuel switch). 2 Figure 1: Causal paths from increased carbon prices to reduced power sector emissions Short-versus long-term decisions. These causal paths relate to economic decisions with distinct time horizons. The first path concerns short-term operational decisions on the dispatch of power plants with a horizon from quarter hours to several days. Despite imperfect foresight on renewable infeed and electricity prices (aleatory uncertainty), there is little uncertainty about the basic pattern behind these decisions (epistemic uncertainty) [27]. As shown in the following, all of the compared models capture these decisions in a relatively similar manner. By contrast, the second and third paths relate to long-term planning decisions on power plant investment and decommissioning with a horizon of years to decades. This comes along with much higher aleatory uncertainty but also some epistemic uncertainty. Related to these two types of uncertainties and to the challenges of (potential) model size and complexity, the models in the following experiment differ by whether and how these decisions are captured.
Fuel switch. The fuel switch in the short-term dispatch of power plants is quantified in Figure 2. For comparability, this figure is based on the same assumptions as those used in the following model experiment (see Appendix). For each technology, power plants of different ages and hence different efficiencies result in a range of marginal costs, displayed as the shaded area around the line of average marginal cost. 3 The different carbon price scenarios assessed in the model experiment are indicated with dashed lines. It can be seen how the marginal costs of the various fossil-fueled technologies increase at a different pace with increasing carbon prices, depending on their carbon intensity and their efficiency. As a result, combined cycle gas turbines (CCGT) become the least marginal-cost technology at higher carbon prices while being relatively expensive at low carbon prices. Note that, because of the wide marginal cost ranges, the fuel switch is a continuous process. For the example of the 57 EUR/t carbon price scenario, the average lignite-fired power plant is more expensive than CCGT, although some lignite-fired power plants are even cheaper. Capacity adjustments. The capacity adjustments resulting from an increase in carbon prices are qualitatively illustrated in Figure 3. On the one hand, an increase in renewable capacity reduces the residual load, which leads to a ceteris paribus reduction in power generation from all other technologies, including generation and hence emissions from all fossil-fueled technologies. On the other hand, the premature decommissioning and non-replacement of hard-coal-and lignite-fired power plants leads to a reduction in power generation from these technologies, which is substituted by less carbon-intensive power generation from natural gas. Note that this figure ignores the fuel switch, which will further reduce emissions. 3 Besides their age, the efficiency of power plants depends on other factors (size, available cooling options, cogeneration of heat, etc.). While some of these aspects are considered in the following model experiment, they are neglected in Figure 2 for simplicity. Note that the range of marginal cost is relatively small for gas-fired power plants because of their shorter lifetime when compared to power plants running on hard coal and lignite. All the gas-fired generators that will remain in 2030 will be relatively young and have similar efficiencies. 4 The assumed fuel prices are 25.9 EUR/MWhth for natural gas, 12.6 EUR/MWhth for hard coal, and 5.5 EUR/MWhth for lignite. Moderators for the impact of carbon prices. The impact of carbon prices on carbon emissions is influenced by other factors, both in the short and long term. The most important factor is probably the price of fuel. For example, high natural gas prices may impede, while low natural gas prices may amplify, the fuel switch with coal and lignite, all else being equal. Another exemplary moderator, which we analyze in more detail in the following, is the cogeneration of heat and power. Because CHP plants must fulfill a given heat demand, they may be dispatched, even though an increase in carbon prices causes their marginal costs to be higher than those of less carbon-intensive technologies. Furthermore, the heat-driven must-run generation of CHP plants tends to depress electricity prices, including the market value of renewables. As a result, CHP may impede market-based investment in renewables even at higher carbon prices. Cogeneration may also affect capacity adjustments between hard coal, lignite, and natural gas, but it is ex ante unclear in which direction. While CHP capacity is more expensive to replace, the savings from a replacement may also be higher because of the higher utilization driven by the heat demand. The impact of CHP will be quantitatively assessed in the following model experiment. Further moderating factors for the impact of carbon prices include reserve provision and ramping constraints. Although these are accounted for in the investigated models, an in-depth analysis of their moderating effects on the impact of carbon prices is beyond the scope of this paper.
Other drivers of carbon emissions. While the above moderators influence how carbon emissions are affected by carbon prices, it is important to note that other factors influence carbon emissions irrespective of carbon prices. Even without increased carbon prices, renewable support policies will cause some renewable capacity to be installed, which will reduce carbon emissions. Furthermore, some coal capacity will be retired, reaching the end of their lifetime and may, even at low carbon prices, be replaced by less carbon-intensive technologies. This effect may be amplified by coal phaseout policies. By contrast, the retirement of nuclear capacity may lead to a ceteris paribus increase in carbon emissions. To realistically assess the impact of increased carbon prices in the following experiment, these other influencing factors on carbon emissions are considered equally across all models and scenarios.

Experimental setup
Outline. This section describes the setup of the model comparison experiment. Subsection 3.1 introduces the utilized models, Subsection 3.2 provides an overview of the joint scenarios, and Subsection 3.3 explains how the model results are evaluated.

Models
Overview. This article compares five electricity market models: the PowerFlex model (PFL) [28], the joint market model (JMM) [29], the SCOPE model (SCO) [30], the electricity market model EMMA (EMM) [31], and the DIMENSION model (DIM) [32]. Each model has frequently been used for academic research and policy advice [6,[33][34][35][36][37][38][39][40][41][42][43][44][45][46]. They all optimize the dispatch of power plants, the operation of storage, and cross-border trade in an hourly resolution over one year. In addition, some models optimize the investment in and the decommissioning of power plants and storage, as summarized in Table 1. Further key model differences concern the techno-economic details associated with power plant inflexibility and efficiency, as well as geographical scope. These differences are discussed in more detail in the following paragraphs. a Note that start-up costs and cycling constraints have been added to EMM after the previous experiment on the German coal phase-out (Pöstges et al. [47]). b This is with the exception of Austria and the Netherlands, for which JMM also considers heat networks. c Including the cost for fuel consumption. By contrast, EMM models the additional fuel consumption as decreased efficiency, accounting not only for the cost but also for the emissions of start-up-related fuel consumption.
Dispatch modeling. All five models mentioned here optimize the hourly dispatch for conventional power plants using nuclear, lignite, hard coal, natural gas, and oil as primary energy sources. In addition, the dispatch of pumped hydro storage and hydro reservoirs is endogenously modeled. The availability of the above technologies is considered based on typical outage probabilities. The models differ in the level of detail by which power plants are implemented. On the one hand, EMM, DIM, and JMM aggregate power plants based on the primary energy source, engine type, and commissioning year into vintage classes, of which the dispatch is modeled in a simplified, linear manner. On the other hand, SCO and PFL consider single power plants, with SCO applying mixed-integer unit commitment and PFL applying linear modeling. 5 In any case, carbon prices-which are at the core of this study-are considered in the dispatch optimization as a mark-up on variable fuel, operation, and maintenance costs. For other energy sources, the dispatch is fixed to exogenous time series (see Appendix).

Investment modeling.
While the dispatch-only models, PFL and JMM, make exogenous assumptions on the installed capacity, the investment models, SCO, EMM, and DIM, only take existing capacity and committed investments as given and allow endogenous capacity additions for certain technologies (see Appendix). In addition to endogenous investment, EMM and DIM allow for the endogenous decommissioning of power plants that become unprofitable. While EMM simultaneously optimizes capacity and dispatch decisions, SCO and DIM consist of separate investment models and subsequent dispatch models. Note for SCO that mixed-integer constraints are considered only when optimizing dispatch. Furthermore, DIM determines the intertemporally optimal capacity mix, which means that investment and decommissioning in 2030 consider the evolution of post-2030 revenues. DIM is also the only model to feature capacity constraints to ensure adequate investment in each country.

Inflexibility of conventional power plants.
All models consider in detail the constraints and characteristics of conventional power plants, although these constraints differ in their exact implementation (Table 1). For the example of CHP, some models include heat constraints for single units, heat networks, or vintage classes (JMM, SCO, and EMM), while others aggregate heat constraints at the national level (PFL and DIM). Furthermore, SCO is the only model to consider electric boilers, which can relax the CHP must-run constraint. PFL and DIM take into account CHP only in Germany, and JMM and SCO aggregate CHP outside Germany at the vintage or national level. Other model differences concern whether start-up costs and constraints on minimum operating time and downtime are implemented. There are even differences in the implementation of reserve energy and ramping constraints, which are covered by all models (for a detailed discussion, see Pöstges et al. [47]).

Efficiency of conventional power plants.
For our focus on power sector emissions, the efficiency of power plants is of particular interest. While all models assume the same technology-specific efficiency ranges, they differ by how these assumptions are considered (Table 1)

Scenarios
Overview. All of the above models were exposed to the same set of scenarios. We modeled the year 2030 with different levels of carbon pricing: 27, 57, and 87 EUR/t. In addition, we simulated the year 2016 based on historical input data as a reference (with a carbon price of 5.2 EUR/t). As this modeling experiment focuses on the power sector, we set the carbon price as an exogenous input parameter. This is a simplification to isolate the modeling of the power sector from the real-world complexity of carbon pricing in the EU ETS.
Harmonized input parameters. Table 2 summarizes the other model input parameters. The large majority and the most important of them were harmonized across the models, while those parameters that are specific to the different model implementations were not. A more detailed description of the scenarios and assumptions can be found in the Appendix. Sensitivity without CHP. In addition to our main scenario, we ran a sensitivity analysis without CHP. In this hypothetical scenario, it was ensured that neither dispatch nor capacity decisions were affected by CHP anymore. The aim of this sensitivity run is to identify the effect of CHP and of the differences in modeling CHP on the estimated relationship between carbon prices and power sector emissions.

Metrics and scope
Emissions and generation. The focus of our evaluation is on power system emissions, as reported by the individual models. 6 Model differences in terms of emissions are expected to be driven by the generation mix (Section 2) and by the different implementations of power plant efficiency (Subsection 3.1). Against this background, we evaluated annual generation quantities by fuel, while fuels with fixed annual generation were aggregated into the categories "RE Fixed" and "Fossil Fixed" (see Appendix). Furthermore, we analyzed the specific emissions per MWhel for each fuel.

Investment and decommissioning.
For the models that consider investment (SCO, DIM, and EMM) and decommissioning (DIM and EMM), we also evaluated endogenous capacity adjustments. This was limited to market-based investments that exceed the politically targeted expansion of renewables and the already committed investment in conventional power plants. Moreover, only market-based decommissioning was analyzed, excluding planned decommissioning or age-based retirement.

Market value.
To better understand the role of market-based investment in renewables and related model differences, we evaluated their market value, which is their revenue earned in markets without any form of subsidy [33,48]. Market-based investments become profitable whenever the market value exceeds the corresponding levelized cost of electricity. 7 In the market equilibrium, the market value is equal to the levelized cost. In the context of our model experiment, we expect carbon prices to drive the market value of renewables through the variable cost of price-setting fossil-fueled generators, and we are interested in how this effect may be captured differently across models.  6 To minimize the impact of CHP and complementing heat boilers on the results, the models estimate electricityrelated emissions as if heat production would not affect electricity-related emissions. 7 The levelized cost of electricity is calculated based on the harmonized input assumptions on investment cost, fixed operation and maintenance cost, interest rate, lifetime, and the annual capacity factors from the renewable profiles. 8 In addition to the 26 countries within the assessment scope, DIM and SCO include Croatia and Slovenia, and JMM includes Croatia, Slovenia, and Bosnia and Herzegovina. Note that DIM neglects Luxembourg, which accounts for only 0.2% of the electricity demand across the model scope.

Results
Outline. This section presents and discusses the results. Subsection 4.1 gives an overview of the model results and model differences at the European level. Two observed differences are analyzed in more detail afterward: the role of endogenous investment in renewables using the example of France (Subsection 4.2) and the role of CHP (Subsection 4.3).

Response of the European electricity sector to increasing carbon prices
Introduction. The main results are summarized in Figure 4. The carbon emissions and electricity generation are displayed for each model, disaggregated by fuel and aggregated over all 26 countries ("26C"). 9 In the following, we discuss the model results for 2016 and 2030 at different carbon prices.  [49]. 10 At the fuel-specific level, however, more heterogeneity emerges. For example, even though the overall emissions reported by PFL and SCO differ by less than 1%, hard-coal-related emissions differ by 31%. Part of the differences in the modeled emissions can be traced back to the corresponding differences in the electricity generation mix (Figure 4e). 11 For example, the relatively high natural gas-related emissions in SCO when compared to PFL are in line with SCO's relatively high natural gas-fired electricity generation. However, some remaining differences in model-related emissions cannot be explained by generation. For example, DIM features much lower emissions than SCO, even though both models yield very similar generation mixes.
Differences in specific emissions. The relationship between electricity generation and emissions is disentangled in Figure 5. This figure displays the two factors driving total emissions on the two axes: electricity generation and specific emissions. It is revealed that differences in natural gas-related emissions are mostly due to differences in electricity generation. Hence, these emissions are directly related to how much electricity the models decide to be produced from natural gas. By contrast, differences in lignite-related emissions are mostly due to differences in the specific emissions. Thus, even though the models agree on how much electricity was produced from lignite, they disagree on the related emissions. This explains, for example, the much higher emissions of SCO when compared to DIM, despite their similar generation mix: SCO features higher specific emissions for lignite and hard coal. Such large differences in specific emissions are surprising because all models use the same assumption on fuel-specific emission factors and similar assumptions on the rated efficiency of power plants. We can think of two possible explanations for this: first, models make different assumptions on how the actual efficiency differs from the rated efficiency due to part-load operation, ramping, and start-up (Table 1). Second, power plant restrictions (e.g., related to the provision of heat) may cause less efficient power plants to be in operation compared to a pure merit order dispatch. These two effects can amplify each other; in the case of SCO, plant-specific heat constraints often force power plants to run at part-load with lower efficiency, which causes higher emissions relative to DIM, where national heat constraints require less part-load operation. The impact of CHP will be further investigated in Subsection 4.3.

Large reduction even at a low carbon price in 2030.
Turning toward the emissions in 2030, we find that, at the lowest considered carbon price of 27 EUR/t, the emissions in all models are substantially reduced to 400-640 Mt (Figure 4b). This is 36-57% less than the modeled emissions in 2016 and within the wide range of results from previous studies with similar carbon prices (300-800 Mt [12,14]). The large scale of this reduction may seem remarkable at such a low carbon price but can be easily explained by the exogenously fixed increase in renewable electricity generation ("RE Fixed" in Figure 4f). Driven by this politically defined expansion of renewables, emissions decrease across all models and for all fossil fuels, including less carbon-intensive natural gas. This is in line with Section 2, where we found 27 EUR/t to be too low for a substantial fuel switch.
Model differences at low carbon prices. Perhaps more surprising is the large variance across the model results at this low carbon price. Most of this variance can be traced back to the different model types.
The highest emissions are found for the dispatch-only models JMM and PFL. In SCO, which also considers endogenous investment, the somewhat lower emissions are due to renewable generation from market-based investment in wind power ("Wind Market" in Figure 4f), which exceeds the politically defined renewable expansion ("RE Fixed"). EMM and DIM, which model both investment and decommissioning endogenously, feature the lowest emissions. Figure 4f reveals that this is related to lower hard-coal-and lignite-fired electricity generation and, in the case of DIM, a larger marketbased investment in wind power.
Investment and decommissioning at low carbon prices. Figure 6 provides a direct view of endogenous investment and decommissioning in capacity terms for the year 2030. Most of the endogenous investments are related to wind power, while the magnitude of investments differs across the models. Interestingly, these cannot be explained by the model types: DIM features a much higher investment than EMM, even though both models equally consider endogenous investment and decommissioning. These differences will be further explored in Subsection 4.2. In addition to the market-based expansion of wind power, all investment models consider endogenous investments in natural gas-fired power plants. Note that in EMM and SCO, these natural-gas-fired power plants are built to produce not only electricity but also heat, while in DIM, they are solely used to produce electricity. The share of CHP in natural gas investments is about 75% in EMM and about 55% in SCO. In EMM and DIM, hard-coal-and lignite-fired power plants are decommissioned, which explains the lower emissions in these models when compared to SCO. In DIM, this effect is amplified by the decommissioning of inefficient natural gas-fired power plants in Italy and Spain. 12 Higher carbon prices in 2030. At higher carbon prices, the differences in the modeled 2030 emissions increase even more (Figure 4c-d). This heterogeneity can again be associated with the different model types. Relatively little reduction in carbon emissions is found for the dispatch-only models, PFL and JMM (about 100 Mt per 30 EUR/t increase in the carbon price). This reduction can be attributed to the fuel switch: hard-coal-and lignite-fired generation is gradually substituted by natural gas-fired generation. This leads to a substantial decrease in hard-coal-and lignite-related emissions at the expense of a smaller increase in natural gas-related emissions (Figure 4g-h).

Investment model results at higher carbon prices.
For the investment models, the initial emission reduction is about twice as large as for the dispatch-only models (about 200 Mt when the carbon price increases from 27 to 57 EUR/t). With even higher carbon prices, the further decrease in carbon emissions is similar to that in the dispatch-only models (about 100 Mt for another 30 EUR/t increase in the carbon price to 87 EUR/t). Figure 5g-h and Figure 6 reveal that the initially more pronounced decrease in carbon emissions is linked to further endogenous investment in renewables and, for EMM and DIM, to further endogenous decommissioning of hard-coal-and lignite-fired power plants in the 57 EUR/t scenario. In the 87 EUR/t scenario, however, these effects do not increase much further. This may be due to a minimum must-run of fossil generators related to the provision of heat and balancing reserves. The provision of heat as a possible explanation is analyzed further in Subsection 4.3. Nevertheless, the endogenous investment in wind power at 87 EUR/t is substantial, reaching 200-316 GW on top of the 170-GW exogenous wind power expansion according to national targets. In total, this means yearly investments of 26-35 GW, which is much more than the 12 GW built annually during 2011-2020.
Remaining emissions at high carbon prices. In summary, the differences in the aggregated emissions at higher carbon prices are even larger than those at lower carbon prices: 240-540 Mt at 57 EUR/t and 190-490 Mt at 87 EUR/t (45-75% and 52-80% less than in 2016, respectively). These wide ranges put previous estimates from single models into perspective, for example, 300 Mt at 87 EUR/t from Bruninx et al. [11]. Furthermore, there is substantial heterogeneity in fuel-specific emissions. These can be linked to differences in electricity generation between the models but cannot be explained by the different model types. For example, SCO has the highest hard-coal-and lignite-related emissions at 87 EUR/t despite endogenous investment in renewables. Among other restrictions, these counterintuitive observations may be related to CHP, which is further analyzed in Subsection 4.3.

Market-based investment in renewables
Case study introduction. As identified in the previous subsection, market-based investment in renewables is a key factor to reduce carbon emissions. However, the models featuring endogenous investment differ in their results concerning the level of investment. To better understand what causes these differences, this subsection analyzes in more detail onshore wind power investment using the example of France. This case study was chosen due to the comparatively high wind potential in the area and because the French results are representative of the overall European results regarding the following two characteristics: first, endogenous investment in renewables increases with the carbon price; second, DIM has consistently higher investment than SCO and EMM (Figure 7). Market values-approach. To explain the model differences, we contrast the results for endogenous onshore wind power investment with the market value of onshore wind power for the example of France. In partial equilibrium models, which are compared in this study, the market value equals the levelized costs of electricity (LCOE), which are harmonized across the models (Subsection 3.3). In SCO and EMM, the equilibrium market value is based on the hourly wholesale electricity prices in 2030. In DIM, the equilibrium condition is more complex due to the previously mentioned model particularities (Subsection 3.1): first, because of the intertemporal optimization, not only the market value in 2030 but also future market values need to be considered by calculating the discounted average market value over the power plants' lifetimes. Second, because of the capacity constraint, not only the energy value but also the capacity value of wind energy must be included.
Market values-results. The market values in the different models at carbon prices of 27 and 87 EUR/t are summarized in Figure 8. Regarding the condition that the market value equals the LCOE in the market equilibrium, SCO and EMM behave as expected: in the scenarios with endogenous investment, the market values in 2030 exactly match the LCOE. By contrast, SCO does not invest in renewables at a carbon price of 27 EUR/t because the market value is lower than the LCOE.
DIM. The DIM model features higher investments than the other models despite market values in 2030 being lower than the LCOE. This can be explained by the two aforementioned model particularities: market values are higher in some of the future years and the capacity values considered only in DIM must be added to the energy market values. Indeed, the discounted average of the 2030-2050 sums of energy and capacity market values (see bars "DIM avg." in Figure 8) is equal to the LCOE. 13 Counterintuitively, there is no clear trend in the future aggregated market values. This is due to a few overlapping effects. First, the exogenous decommissioning of nuclear power plant capacity in France tends to increase the market values of onshore wind farms, as low-cost energy production is reduced. Second, the assumed cost degression of renewables leads to higher endogenous wind investments in future years, which also reduces future market values in the market equilibrium due to the simultaneity of renewable feed-in. Third, the assumed increase in carbon prices generally increases the marginal costs of carbon-emitting power plants and, with it, the market values of onshore wind farms. These effects are further distorted by exogenous changes in interconnector capacities. The capacity value plays only a minor role when compared to the inter-annual variations.

Combined heat and power generation
Case study introduction. In Subsection 4.1, part of the heterogeneity found in the model results cannot be explained by the model types (dispatch, investment, and decommissioning). While the models generally agree on a declining role of hard-coal-and lignite-fired generation in the case of high carbon prices in 2030, models of the same type disagree on how much generation and emissions from these sources remain: JMM and EMM report significantly more hard coal and lignite than their counterparts PFL and DIM, respectively. Moreover, the role of coal and lignite in SCO is comparable to or even more pronounced than in the dispatch-only models, even though SCO considers endogenous investment in renewables. In addition, at low carbon prices, the model type cannot explain why hard coal and lignite are reduced more in EMM than in DIM (see Figure 4). One potential reason for this observed heterogeneity lies in the different ways combined heat and power and related heat constraints are treated in each model (recall Subsection 3.1). To explore this hypothesis, Figure 7 contrasts our primary model results (left plots) with a sensitivity analysis in which we exclude CHP restrictions from our modeling altogether (right plots). Comparing each pair of charts reveals how much CHP drives the results for each model. For this comparison, we focus on the scenarios with the lowest and highest carbon prices in 2030. Figure 9: Electricity-sector carbon emissions and electricity generation in all 26 European countries included in the analysis, contrasting our primary result to a sensitivity without CHP restrictions Main finding. We find that differences in the modeling of CHP indeed represent an important driver of differences in the model results. By contrast, disabling heat constraints generally reduces model heterogeneity beyond the implications of endogenous investment and decommissioning. This is true at both high and low carbon prices, as discussed in more detail below.
Low carbon price. At a low carbon price, neglecting CHP mainly reduces natural-gas-fired electricity generation and related emissions (Figure 7a, b, e, and f). This gap is filled by increased generation and emissions from (cheaper) coal-fired power plants, as electricity generation from CHP units is now driven mainly by marginal generation costs and no longer by heat supply obligations. This is in line with Figure 2, which shows that, at a carbon price of 27 EUR/t, natural gas is the most expensive source of electricity, except for the most inefficient hard-coal-fired power plants. The reduction in natural gas is more pronounced in models with heat constraints at the unit or vintage level (JMM, SCO, and EMM) than in those with national heat constraints (PFL and DIM). The largest reduction can be observed for EMM, which has the most detailed heat constraints outside Germany. 14 In other words, the higher the level of detail regarding the modeling of CHP, the higher the impact of neglecting CHP, which materializes in reduced must-run electricity generation from natural gas and a corresponding shift to hard-coal-and lignite-fired generation with lower marginal generation costs in the case of low carbon prices. However, the effect of neglecting CHP on overall emissions is ambiguous. Total emissions in the dispatch models remain roughly the same (PFL) or increase (JMM) because reduced CHP-related mustrun generation from natural gas can only be replaced by generation from existing hard coal and lignite capacities. By contrast, total emissions decrease in SCO and DIM because of more endogenous investment in renewables. In EMM, the use of hard coal and lignite, as well as renewables, increases such that the total emissions remain constant. While the overall range of total emissions does not considerably change without CHP, the fuel mix becomes more homogenous across the different models.
High carbon price. At a high carbon price, neglecting CHP reduces hard-coal-and lignite-fired electricity generation and related emissions (Figure 7c, d, g, and h). This is as expected because these are the most expensive fossil fuels at a carbon price of 87 EUR/t ( Figure 2). Because the reduction in hard coal and lignite can only be compensated for by less carbon-intensive sources-be it natural gas or endogenous investment in renewables-total emissions decrease across all models. Again, these changes are more pronounced in models with detailed heat constraints. Without CHP, the reported generation and emissions become more homogenous within model types. The results of EMM are now very close to those of DIM, and the remaining difference can be explained by the higher endogenous investment in DIM (Subsection 4.2). While the results of the dispatch models JMM and PFL also become more similar without CHP, lignite-fired generation and related emissions remain higher in JMM. This may be related to the specific emissions of hard coal being lower in PFL than in JMM, while the specific emissions of lignite are very similar ( Figure 5). Because the marginal costs of both technologies are comparable at a carbon price of 87 EUR/t (Figure 2), the marginal costs of hard coal may be just below those of lignite in PFL, while it is the other way around in JMM. Another possible explanation lies in the different aggregation levels of generation units and the corresponding implementations of ramping and cycling constraints. In summary, we find the omission of CHP to reduce, but not fully mitigate, model differences beyond endogenous investment in decommissioning. Put differently, CHP is an important driver of remaining carbon emissions at high carbon prices. This has implications for both modelers and policy makers, as discussed in the following.

Discussion and conclusions
Model comparison. In this model comparison experiment, we used five detailed numerical models of the European electricity system to replicate the year 2016 and simulate the year 2030, subject to different carbon prices. Many assumptions and parameters were aligned to identify how differences in the model formulations drive differences in results. While all models perform well in backtesting, we find stark differences in 2030, particularly at higher carbon prices. For example, at a price of 87 EUR/t, carbon emissions range from 190 Mt to 490 Mt. This is −46% to +42% around the average emissions of all models (346 Mt). These differences occur even though the model alignment involves some deliberate simplifications, such as the exogenous definition of carbon prices, electricity demand, and net transfer capacity.

Investment modeling.
We have identified two main drivers for these differences. Most important is the endogeneity of investment and decommissioning decisions in generation capacity. This results in different capacity stocks, based on which power plant dispatch and emissions are calculated. Two of the models have no endogenous investment; that is, all changes in generation capacity are model inputs. In the other three models, investment is a decision made as part of the optimization procedure. Of those three, two also treat decommissioning endogenously-one assuming perfect foresight until 2070 and the other assuming no inter-annual foresight. One can think of these model types as different views on the drivers of investment and decommissioning decisions in the real world-from being fully independent of changes in market conditions (myopic) to being based on perfect foresight decades into the future.
When investment matters. The way investment is modeled matters most when a rapid transformation of the power system is underway. In our experiment, this becomes particularly evident at high carbon prices, which trigger massive market-based investments in wind power-but only in those models that allow such investments. At 87 EUR/t, the investment models add large quantities of wind power in the range of 26-35 GW per year across Europe. This is much more than the 12 GW built annually during 2011-2020. In this sense, whether investment is considered endogenously may also represent different views on the possibility of scaling up the expansion of certain technologies.

CHP.
The second driver of the model differences that we have identified is more technical in nature: the way the heat supply is modeled. Industrial and district heating supplied by CHP plants is a complex engineering problem that restricts power plant operation and investment in multiple ways. In energy system models, these constraints need to be approximated, and there is no single best way of doing this. While all five models contain a detailed representation of CHP plants and heat provision, the exact implementation differs. These differences are responsible for much of the otherwise unexplained model variation. This finding underlines that fossil-fueled CHP can be a source of inertia in deep decarbonization, for which alternative technologies, such as power-to-heat and synthetic fuels, are needed. Scientific assessments that ignore heat supply will hence tend to underestimate the difficulties and costs of deep emission cuts in cold-climate energy systems.
The effectiveness of carbon pricing. Our results yield conclusions on carbon pricing in the power sector in the real world. While higher carbon prices clearly lead to a substantial reduction in carbon emissions, the size of this effect will depend on the feasibility of scaling up market-based investment in renewables and on the possibility of reducing the must-run of CHP. The scaling of wind power, for instance, requires land availability, social acceptance, network expansion, and effective permission processes. To maximize the effectiveness of carbon pricing, other policy measures should aim to improve these prerequisites. Furthermore, for investments to materialize, policy makers must provide sufficient evidence to investors that they are willing and capable of making time-consistent choices.
Model-based policy advice. More generally, our study showcases how the results of numerical energy system analyses are driven by modeling choices. The nature of these choices ranges from high-level considerations, such as market-based investment in renewables, to technical details, such as the cogeneration of heat. This has important implications for model-based policy advice. Modelers should make their choices consciously and explicitly, and policy makers should be aware of the uncertainty related to energy system models. Multi-model assessments are recommended to reveal model uncertainty. Given that we compare rather similar models with a detailed representation of the power sector, this conclusion may be even more true when considering the broader range of model types used for policy advice in the energy and climate domains, including electricity grid models, multi-sector energy models, ETS models, integrated assessment models, general equilibrium models, and statistical methods. Renewables and other technologies. The remaining technologies are constrained in terms of annual generation. For wind and solar power, we define a minimum level according to national renewable energy targets, but further market-based investment is possible (in those models that consider investment) up to the point where national potentials are reached. The hourly generation of these variable renewables, including hydro run-of-river, is fixed to hourly availability profiles. While the annual generation of hydro reservoirs is fixed to the natural inflow, their hourly dispatch is optimized. For simplicity, other renewable and non-renewable generation is fixed to a baseload profile. The emissions related to non-renewable generation are also exogenously defined. Note that we distinguish pumped hydro storage into natural inflow (which is fixed) and pumped inflow (which is optimized).

Load and cost assumptions.
Other key input parameters concern the electricity load time series, which are set to the historical values of 2016. For simplicity, we apply the same profiles to 2030, even though electricity demand is likely to increase due to an increased electrification of energy end-uses [26]. Furthermore, cost and efficiency parameters are harmonized across the models according to Table 4, and the discount rate is set to 8% for all investment models.
Assumptions on constraints on power generation. As discussed before, the dispatch of conventional power plants may be constrained by the cogeneration of heat, the provision of reserve energy, and ramping limitations.
With this regard, all models use the same heat demand profile for cogeneration and the same assumption on required balancing reserves. However, more detailed assumptions, which are often specific to the different modeling approaches (Subsection 3.1), are necessary. These inputs were not harmonized; that is, the models use their default parameters.