Diagnostic indicators for integrated assessment models of climate policy Technological Forecasting & Social Change

Integrated assessments of how climate policy interacts with energy-economy systems can be performed by a variety of models with different functional structures. In order to provide insights into why results differ between models, this article proposes a diagnostic scheme that can be applied to a wide range of models. Diagnostics can uncover patterns of model behavior and indicate how results differ between model types. Such insights are informative since model behavior can have a significant impact on projections of climate change mitigation costs and other policy-relevant information. The authors propose diagnostic indicators to characterize model responses to carbon price signals and test these in a diagnostic study of 11 global models. Indicators describe the magnitude of emission abatement and the associated costs relative to a harmonized baseline, the relative changes in carbon intensity and energy intensity, and the extent of transformation in the energy system. This study shows a correlation among indicators suggesting that models can be classified into groups based on common patterns of behavior in response to carbon pricing. Such a classification can help to explain variations among policy-relevant model results. ©2014TheAuthors.PublishedbyElsevierInc.ThisisanopenaccessarticleundertheCC-BYlicense (http://creativecommons.org/licenses/by/3.0/).


Introduction
This study presents an approach for diagnosing the behavior of energy-economy and integrated assessment models (IAMs) of the coupled energy-economy-climate system. IAMs are commonly used to analyze the costs and technological implications of long-term climate change Technological Forecasting & Social Change 90 (2015) 45-61 mitigation policies [1][2][3][4]. 3 These models can differ greatly in how detailed various aspects of the system are represented and in how the components interact. For instance, some IAMs place particular focus on energy technology detail whereas others also represent the land-use sector or macroeconomic feedbacks. Climate policy analysis often involves comparisons among results from several IAMs in order to provide more robust cost estimates and a clearer representation of uncertainties. The AMPERE project, which generated the findings discussed here, is a case in point [6,7]. Given the differences in model structure and assumptions, results vary among models. This variation is informative as it indicates that a range of outcomes is plausible. The task of model diagnostics is to help the policy and integrated assessment community to identify model behavior patterns among this variety of results.
The focus on model behavior differentiates diagnostics from model intercomparisons for policy analysis [8]. Diagnostic analyses do not aim to capture policy dimensions in detail but rather try to characterize the model response to single policy signalssuch as a carbon priceto identify and explain model differences. Accordingly, the scenarios used in this study were designed purely for diagnostic purposes and not for policy analysis. An analogous approach has long been applied by the climate modeling community, which has compared the response of general circulation and earth system models to a single climate forcing signal in a number of diagnostic experiments [9].
To date, the IAM community has conducted diagnostic model analyses much more sporadically than the climate modeling community. Early work on estimating and comparing price elasticities across models was conducted by the Stanford Energy Modeling Forum (EMF), which has long included quasi-diagnostic studies of climate policy scenarios in its scope [10]. To some degree, a number of other model comparison studies have included diagnostic model runs, often with pre-defined carbon taxes [11][12][13]. However, few attempts have been made to introduce a comprehensive set of diagnostic experiments and indicators aiming to classify models in terms of key behavioral characteristics which could be used across different studies. A reason might have been the strong focus on policy applicability of the IAM community. There is renewed interest in model diagnostics based on the recognition that it can be as useful in the IAM context as it has been for climate modeling. Next to AMPERE, it is pursued in the DOE sponsored "Program on Integrated Assessment Modeling Development, Diagnostic and Inter-Comparisons (PIAMDDI)" [14], and has been taken up by the Integrated Assessment Modeling Consortium (IAMC) [15]. The objective and motivation of the diagnostic study is given in the next section, followed in Section 3 by a discussion of the study design that was used to identify and test diagnostic indicators; the results are presented in Section 4; and a preliminary model classification scheme based on these indicators and their correlations is introduced in Section 5. Section 6 concludes.

Objective and motivation of the diagnostic study
The objective of this study is to establish a characterization of energy-economy and integrated assessment models based on their responses to greenhouse gas pricing scenarios. The scenarios we employ assume idealized climate policy setups since their purpose is diagnostics and not policy analysis. The resulting model characterization aims to provide a better understanding of model outcomes and behavior, which would be useful for model applications to climate policy analysis. Such a characterization should be straightforward enough to help analysts identify important model response patterns even if they are not familiar with the detailed structures of the respective models. For instance, diagnostic indicators can point out whether or not the models involved in an analysis are inclined to show strong energy system or emission responses to a carbon price signal. Diagnostic indicators may also show whether a model's inherent behavior pattern tends to produce high or low mitigation costs for a given emissions reduction target.
Since our objective is to focus on the outcome of model behavior, we primarily apply a top-down approach to model characterization by studying model results rather than starting with a description of model structures and input assumptions. A bottom-up approach, by contrast, would try to develop a model classification based on how comprehensively the economy is represented or based on what assumptions about time preferences and myopia are made. However, the model taxonomy of state-of-the-art IAMs has become very complex, making it harder to perform simple classifications along these lines [16]. In addition, the devil is often in the details. Model specifics such as the availability of certain energy technologies and constraints to the expansion of available technologies can affect model response as strongly as model type. Some of the challenges of identifying the various factors impacting model responses are described in [17]. A bottom-up approach would quickly become impractical as it requires an analyst to know all models in detail. We therefore perform only a very limited bottom-up analysis of the models and primarily focus on the results of a single diagnostic study that is applicable to a large class of models. A set of diagnostic indicators helps to identify response patterns from the model results.

Criteria for diagnostic indicators
To serve the objective of characterizing policy-relevant response patterns among a broad range of models, indicators should at least meet the following criteria: a) identification of heterogeneity in model responses b) diagnosis of relevant features for climate policy analysis c) applicability to diverse models d) accessibility and ease of use.
Examples of model characteristics with relevance for policy analysis are model dynamics related to climate change mitigation, the associated economic costs, and energy system developments. Since IAMs generally model emissions of at least some greenhouse gases, model behavior related to climate change mitigation can be readily captured by focusing on emissions abatement. Mitigation costs are also reported by most models, although differences among cost reporting methods have to be 3 There exists no single definition of integrated assessment models of climate change. The class of IAMs is sometimes restricted to coupled economy-climate models that allow for weighing the costs of mitigating climate change against the damages of unabated climate change (costbenefit climate policy analysis). Here, we use the term more broadly to include energy-economy-climate models that are used to analyze climate policies [5]. accounted for. IAMs vary in their coverage of the energy sector and energy end-use sectors, but generally address important aspects such as energy intensity, carbon intensity, and changes in the deployment of energy supply technologies. Thus, we identify four simple and widely applicable diagnostic indicators that characterize model response to climate policy regarding • the size of emissions reductions relative to baseline emissions without climate policy, • the reliance on carbon intensity reductions vs. energy intensity reductions to achieve emissions abatement, • the scale of the transformation of the energy system, and • the mitigation costs as a function of the carbon price signal and the associated emissions abatement.

Study design and participating models
To develop and evaluate diagnostic indicators, we compare results from eleven global models that were run with a common set of scenarios to identify model-specific behavior. These scenarios were constructed for the sole purpose of model diagnostics. To improve comparability and narrow down the factors behind model responses to carbon prices, the model teams harmonized their regional assumptions about population and economic growth. More information on the harmonization of underlying population and economic growth assumptions are given in the Supplementary Online Material (SOM).

Baseline and diagnostic scenarios
A baseline scenario and four diagnostic scenarios were run. The baseline does not include any climate mitigation policies and thus no price on greenhouse gas emissions after 2012. The four diagnostic scenarios use globally-harmonized carbon taxes starting in 2010. This start date permits a diagnosis of model responses over several decades even with models with a time horizon that is limited to 2050. Tax levels are given in 2005 USD.
• Two scenarios with a constant carbon tax: one low-tax scenario with a tax of 50 USD per ton CO 2 and one high-tax scenario with a tax of 200 USD per ton CO 2 • Two scenarios with an exponentially increasing carbon tax (growing by 4% per year); one starting at a low carbon price (12.50 USD per ton CO 2 in 2010), the other one at a higher value (50 USD per ton CO 2 in 2010). The carbon prices quadruple every 35 years, so that by 2045, the low carbon price reaches 50 USD and the high carbon price reaches 200 USD, whereas by 2080, they reach 200 USD and 800 USD respectively. Thus, the constant and increasing tax scenarios cross each other in the year 2045.
In the diagnostic analysis, we focus on the high tax scenarios that show a stronger model response and thus allow for identifying model characteristics more easily. Of primary interest is the scenario with the tax starting at $50 in 2010 and increasing by 4% per year. Taking into account the model-inherent discounting of future values, it exerts a steadier price signal in present value termsdepending on the choice of discount rate in the models. However, the constant tax case is used for comparison purposes to see if model behavior is consistent across scenarios.
Each carbon tax scenario covers the period until 2100, but models with a time horizon shorter than 2100 have adopted the scenarios until their particular end year. Models with a time horizon extending beyond 2100 fix the carbon tax at the value reached by the year 2100 for later periods.
The detailed definition of all scenarios can be found in the Supplementary Online Material (SOM).

Participating models
The energy-economy and integrated assessment models listed in Table 1 participated in the diagnostic study discussed here.
IAMs differ in numerous ways including their sectoral coverage, solution algorithm, representation of GHG emissions and GHG sources, energy demand and supply sectors, population and GDP baselines, and assumptions about techno-economic parameters [5]. They may be broadly grouped into partial equilibrium (PE) and general equilibrium (GE) models. PE models describe processes and markets in one or more sectors in detailsuch as the energy sector, including energy demand by economic sectors and technological specificsand treat the rest of the economy exogenously. This includes assumptions of price-elastic demand in goods and services provided by the represented sectors. PE models typically maximize consumer and producer surplus or minimize production costs of sectors over time. They may or may not include foresight of future supply and demand in the optimization process. Policy costs are calculated in terms of sector cost mark-ups or reduction of consumer and producer surplus, typically deduced from the area under the marginal abatement cost curve for greenhouse gas emissions.
GE models cover the full economy with a more or less detailed representation of specific economic sectors. GEs can use a dynamic recursive approach or intertemporal optimization. Dynamic recursive computable GEs [18] identify market equilibria for each point in time, with exogenous assumptions conditioning how production technology and the size of the economy progress over time. They are inherently myopic and usually provide a detailed description of the sector composition of the economy. Intertemporal GEs focus on the intertemporal dynamics of investment in production capital under foresight about future production and consumption. They describe a closed economy but can usually only represent one to three aggregated economic sectors due to the computational burden of intertemporal optimization. GEs typically express policy costs in terms of production losses, consumption losses or welfare measures.
Both partial and general equilibrium models can include a great variety of low-carbon technology options on the supply and demand side that can deliver emission reductions in response to climate policy. Table 1 includes a measure of the variety of low-carbon energy supply technology options represented in the participating models. This measure is based on a survey of the energy supply technology representation in the models. It distinguishes three supply sectors: electricity generation, liquids production, and other non-electric energy supply (including hydrogen, gases and heat generation). Its derivation and the numerical results based on the survey of available technologies in the models are discussed in the SOM. Most models include a similarly high variety of low-carbon supply options, but some GE models include a noticeably lower number of options. For simplicity, this measure focuses purely on energy supply side technologies and does not cover demand-side options for emissions reduction and use of low-carbon fuels (e.g., electricity or hydrogen in transport), even though demand side options are explicitly represented in some models and are important determinants of the ability of models to achieve low-carbon futures. Nonetheless, the measure illustrates the fact that by modeling the economy as a whole, GE models may not always include the same level of technological detail as more energy-system-focused PE models.
The fact that the models employed in this study represent different model classes is very important for the identification of useful diagnostic indicators. Broad model coverage is needed to evaluate the robustness of findings about these indicators such as their correlation and their implications for model classification.

Results from the diagnostic study
The diagnostic analysis investigates model behavior by comparing results between the baseline and carbon tax scenarios defined in Section 3.1. We establish indicators characterizing the following model responses to carbon taxes: • Emissions abatement in response to carbon taxes (Section 4.1) • Reduction of carbon intensity of energy production compared to the reduction of energy intensity of economic production (Section 4.2) • Structural changes to the energy system (Section 4.3) • Economic implications of carbon pricing (Section 4.4). Since the carbon tax in either scenario takes effect in 2010, we already see significant reactions to high tax levels in that year. In the increasing tax case, all models continue to reduce emissions throughout the time horizon, whereas in the constant tax case, some models show an upward reversal in the long term after an initial reduction. This is due to the depreciation of the current value of the carbon tax in a growing economy. In both tax cases, there are pronounced model differences. MERGE-ETL shows a particularly high increase in baseline emissions due to wide-spread adoption of coal-to-liquids production. GCAM and WITCH show strong emission reductions in the early years of the constant carbon tax scenario. This is contrary to current real-world trends, but it should be noted that the global carbon tax of $200 far exceeds any climate policy efforts to date both in its level and in its coverage. In the later years of the increasing carbon tax case, GCAM shows very high negative emissions, primarily due to a large potential for bioenergy carbon capture and sequestration (BECCS), which is exploited in the case of high carbon prices.

Relative abatement index
To illustrate the model differences, we define a relative abatement index (RAI) characterizing the emission reductions in a carbon tax scenario relative to the baseline: where CO 2 FFI Pol(t) indicates the CO 2 FFI emissions in the carbon tax case and CO 2 FFI Base(t) the emissions in the baseline at time t. We have focused on CO 2 emissions from fossil fuel combustion and industry in the definition of the indicator because these emissions are captured by all energy-economy and integrated assessment models that are used for climate policy analysis. The choice of a larger collection of greenhouse gases and sectors would have already excluded some models from the diagnostic analysis. In addition, the energy sector is the main venue for emission reductions [19], so key characteristics of the emissions response are captured by CO 2 FFI emissions. Fig. 2 shows the emissions abatement relative to the baseline CO 2 FFI emissions over time and across models for both the exponentially increasing and the constant high carbon tax cases.
We observe the following: • In the exponentially increasing tax scenario, the RAI increases over time. Among most models, this rise slows Low down in the latter half of the century, as models other than GCAM find fewer emissions reduction opportunities once emissions are close to zero or negative. • In the constant tax scenario, most models increase their RAI over the first few decades. In the second half of the century, the depreciation of the current value of the carbon tax leads to an attenuation and in some cases reversal of the trend in relative emission reductions among all models. Only POLES continues to consistently increase its relative emissions abatement through 2100. • The ordering of models along their RAI is fairly robust over time and carbon tax scenarios. We can clearly identify a group of models with a stronger relative abatement (MERGE-ETL, IMAGE, MESSAGE, REMIND, and particularly GCAM) and a group of models exhibiting less abatement (DNE21+, GEM-E3, IMACLIM, WITCH, and AIM-Enduse). The POLES model initially shows less abatement as it accounts for constraints to new technology diffusion in the short and medium term, while over time it moves to the high-abatement model group as low-carbon technologies progress. Although AIM-Enduse is headed for a similar trajectory as POLES, its model period ends in 2050, when its abatement response is still comparatively small.
The relative abatement index can also show whether a model's abatement at the global level is a good predictor of its relative abatement at the regional level. In principle, regional marginal abatement cost curves should vary with regional differences in technology performance and costs, energy resource endowments and final energy demand. Among the participating global models, such regional variations are generally much less significant than the inter-model differences, as illustrated by Fig. S1 in the SOM. The models with higher RAI values on the global level also have higher RAI across the regions.

Abatement under different carbon price levels
The RAI as constructed above shows model responses to particular carbon price levels. We can also characterize the impact of carbon price levels with marginal abatement cost (MAC) curves that show emission reductions as a function of carbon price. A common feature of the MAC curves (in 2045) of all models except GCAM is that the relative emissions reduction in the low carbon tax scenario is similar to or larger than the additional emissions reduction between the low carbon tax and the four times higher carbon tax. This gives the MAC curves a convex form with a steepening slope for increasing carbon price levels and increasing convexity over time. In the  second half of the century, even the lower carbon tax leads to an exploitation of most available emissions reduction opportunities so that the additional abatement opportunities are diminishing. Among the participating models, GCAM alone shows high additional reductions from a higher tax level. This is because GCAM allows for significant expansion of lands for bioenergy crops in the carbon tax case, facilitated partly by changes in human diets away from cattle and resulting in large-scale bioenergy carbon capture and storage (BECCS) and negative emissions [20].

Energy use and carbon intensity as CO 2 emission drivers
A useful tool for analyzing the differences between models in terms of CO 2 FFI emissions and emission reductions is the Kaya identity [21] that decomposes the emissions into four factors: population (Pop), per capita income, final energy (FE) intensity of economic production (GDP), and carbon intensity of energy use (after subtraction of carbon captured and stored via CCS technology).
The first two factors, economic activity and population, were harmonized between the model baselines, and should therefore not contribute much to model differences. Even among the carbon tax scenarios in the general equilibrium models, where GDP responds to carbon pricing, the variation in GDP is generally far smaller than the changes in carbon and energy intensity. Therefore, the harmonization of economic growth and population assumptions allows us to focus on carbon and energy intensity as the driving factors of differences in the model results. Fig. 4 plots carbon intensity (as a fraction of carbon intensity in the baseline) against energy intensity (as a fraction of energy intensity in the baseline) across models and carbon tax scenarios. Both carbon and energy intensity reductions increase with the stringency of the carbon tax scenario. In all cases, carbon intensity is reduced more strongly than energy intensity. For high carbon prices, carbon intensity can become negative if the model produces net-negative CO 2 emissions from the large scale adoption of bioenergy combined with CCS.
Based on the Kaya identity, we can say that if the change in GDP is small, the residual CO 2 FFI emissions (Res(CO 2 )), expressed as a fraction of baseline emissions, are approximately a function of residual carbon intensity (Res(CI)) and residual energy intensity (Res(EI)) as fractions of baseline intensities: Res CO 2 ð Þapprox:Res CI ð Þ Ã Res EI ð Þ: A reduction of CO 2 emissions by, for example, 75% can therefore be achieved by a multitude of combinations of CI and EI reductions. For example, reducing one factor by 75% and leaving the other factor unchanged or reducing both factors by 50% leads to the same emissions reduction.  We construct a diagnostic indicator CoEI (carbon intensity over energy intensity) that captures the proportionality of carbon and energy intensity reductions in response to carbon prices: The CoEI is larger than one if energy intensity is reduced more strongly than carbon intensity (Res(CI) N Res(EI)) and smaller than one in the opposite case. Given the strong reduction in carbon intensity displayed in Fig. 4, we expect CoEI b 1 across models. Fig. 5 shows the development of the CoEI over time for the constant and exponentially increasing high tax scenarios. It can be seen that all models increasingly rely on carbon intensity reductions in the increasing tax scenario (CoEI decreasing over time), while the ratio of energy and carbon intensity reductions stabilizes in the constant tax scenario. As models go to negative emissions (=negative carbon intensities) under the increasing carbon tax, the decrease of the CoEI generally levels off, indicating a limit on the amount of net negative emissions that can be achieved (although GCAM reaches this limit at a strongly negative level). In the constant carbon tax case, all models except POLES show a leveling off of the CoEI decrease in the second half of the century, whether negative emissions are achieved or not. This may be due to the models avoiding the most expensive decarbonization options without an increase in the carbon price.
The model differences in the CoEI are fairly robust across carbon tax scenarios from 2040 on. We can identify different groups of modelssome that reduce CI only slightly more than EI: WITCH, GEM-E3, IMACLIM, DNE21+; and some that are strongly inclined to reduce CI: IMAGE, MERGE-ETL, MESSAGE, REMIND and particularly GCAM, which again forms a class of its own due to its large potential for negative emissions. AIM-Enduse and POLES move from the first group toward the second group over time as low-carbon technologies become more widespread. These model groups are somewhat comparable to the groupings based on the relative abatement index (see Section 4.1).

Structural changes to the energy system
A closer look at how the structure of the energy system responds to carbon taxes can help to better understand the inherent model differences in the substitutability between alternative technology options and its influence on differences in CO 2 FFI emissions and carbon intensity.
The left panel of Fig. 6 shows changes in the structure of final energy use by type of delivered energy. Final energy is categorized as solids (i.e. coal and biomass used directly by end users), liquids (primarily oil), and grids and hydrogen. Grids include electricity, piped gas and district heat. In the no-policy baseline scenario, models consistently move toward grids in the long run, indicating an electrification of final energy use. In the carbon tax case, this trend is accelerated in most models, albeit to a varying degree.
Changes of the primary energy mix are shown in the right-hand panel of Fig. 6. All models show that the carbon tax pushes primary energy away from fossil fuels, though the extent varies greatly between models. Compared to the other models, the carbon tax brings only low gains for non-fossil energy in IMACLIM due to strong deployment of carbon capture and storage for fossil energy. For most models, the carbon tax induces a strong shift away from coal, which in the baseline scenario would gain an increasing share of the energy mix.
As most models show that the carbon tax shifts final energy toward grids and primary energy toward non-fossils, the rate at which these categories are transformed can be indicative of how flexible the energy mix is in response to a carbon price. The rate of transformation can be expressed with transformation indicators that measure the changes in the energy mix relative to 2005 (see [22,23] for applications of a similar metric to measure the distance between different technology portfolios). We apply a transformation indicator (TI) that ranges between 0 and 2. A TI of 0 indicates no change in the share among the variables of the category; 2 indicates an absolute shift with one variable rising from 0% to 100% of the share and another falling from 100% to 0%. Shifts between these extremes have a TI between 0 and 2. The TI, relating to the base year of 2005, can be defined as follows, with S1, S2, etc.
indicating the shares of the various components of the energy system or sector: The progression of the transformation indices for the energy mix in the carbon tax scenario is shown in Fig. 7. For primary energy, the TI is generally higher than for final energy, which indicates that models find it easier to substitute primary energy  carriers and power generation types than to shift final energy between solids, liquids and grids.
Some models show a clear correlation between the TIs for final energy and primary energy. GCAM, MERGE-ETL, MESSAGE and REMIND have a high TI for both final and primary energy, whereas WITCH has a low TI for both and DNE21+ and POLES have medium values for both indices. No such clear correlation can be seen for AIM-Enduse and IMAGE. AIM-Enduse starts with a low primary energy TI that shows an upward trend after 2040partly due to increasing market maturity of solar powerwhile its final energy TI remains low. IMAGE has medium primary energy TI, but its final energy TI is high indicating that IMAGE relies more on substitution of energy end use carriers than other models.

Mitigation costs
The economic implications of a carbon tax are typically captured in terms of mitigation costs that are derived from comparing the policy scenario with the counterfactual baseline case that does not include climate policy. For general equilibrium models, the mitigation costs can be expressed as losses in welfare, consumption or GDP relative to the baseline case. The first two metrics directly measure the impact on private income and consumption. GDP is a less satisfactory indicator because it is a measure of output, which includes not only consumption, but also investment, imports, exports, and government spending [24]. Partial equilibrium models do not include the feedback on economy-wide production and household consumption but can express mitigation costs in terms of the change in consumer and producer surplus often deduced from the area under the marginal abatement cost curve. An alternative measure is additional energy system costs compared to the baseline case.
The mitigation costs from general and partial equilibrium models are not fully comparable. We nonetheless present costs from both model types next to each other, especially focusing on the relative changes. Comparisons between the cost measures shown by GEs and PEs have shown that in relative terms, they seem to correlate reasonably well across scenarios and regions [25]. The intertemporally aggregated mitigation costs from the general equilibrium models in this diagnostics study (GEM-E3, IMACLIM, MERGE-ETL, MESSAGE, REMIND, WITCH) are given in terms of the net present value of consumption losses as a percentage of net present value consumption in the baseline (all discounted at 4% per year). The intertemporal mitigation costs from the partial equilibrium models (DNE21+, GCAM, IMAGE, POLES) are given in terms of the net present value of the area under the MAC curve (GCAM, IMAGE, POLES) or in terms of additional energy system costs (DNE21+) as a percentage of net present value GDP in the baseline (all discounted at 4% per year). Although the carbon price in the increasing tax scenario rises far higher over the second half of the century, the constant tax scenario includes the impact of an early price shock. Models differ on which tax scenario leads to the highest costs until 2100. With the exception of GCAM, the partial equilibrium models show lower mitigation costs than the general equilibrium models. This is an indication of the differences in cost metrics. GCAM is an exception that may be explained by the very high abatement response to carbon pricing shown by this model. Among all models, IMACLIM stands out by showing the highest mitigation costs across scenarios. This is due to assumptions of imperfect foresight combined with market and institutional imperfections that, under a carbon tax, can result in GDP losses that are far more significant than in the case of economies with frictionless markets and non-distortive fiscal systems. Fig. 9 shows a positive correlation between mitigation costs and cumulative emissions reductions from different carbon tax scenarios in 2050. This correlation is still mostly intact in 2100, although some models (IMACLIM, MESSAGE, WITCH) suggest that the increasing high tax case ($50 increasing) can lead to somewhat lower cost relative to the amount of emission reductions than the constant high ($200) tax case. The initial shock from the constant $200 tax can be very costly in the short term, and the 4% discount rate gives a large weight to such short-term losses.

Cost per abatement value
As can be seen in Figs. 8 and 9, differences in the cost levels across models increase with increasing mitigation costs in the more stringent carbon tax scenarios. We can study the model differences through a cost per abatement value (CAV) indicator taking into account the mitigation costs (MitCosts) over the period from 2010 to year t, discounted at a rate r of 4% per year relative to the value of reduced emissions, measured in greenhouse gas emissions reduction (GHG Red) times the carbon price (CPrice) over the same period 2010 to t, also discounted at 4%. This measure includes all Kyoto gases represented in a model since the study setup assumes that the carbon tax is applied to Kyoto gases aside from CO 2 by using global warming potentials (GWPs) as conversion factors. The CAV is defined as follows: The result is a dimensionless number signifying the economic implications of emissions abatement resulting from carbon pricing. A high CAV means comparatively higher mitigation costs for a given emissions reduction and carbon price trajectory than in the case with a low CAV. For partial equilibrium models, it essentially describes the ratio between average and marginal abatement costs. For general equilibrium models, macro-economic feedbacks are also factored in. This becomes particularly evident for IMACLIM, for which the CAV exceeds unity until mid-century. Fig. 10 shows the development of the cost per abatement value indicator for the exponentially increasing and constant high tax scenarios. In the exponential carbon tax case, the CAV is declining over time for all models. This indicates that after discounting, the increase in mitigation cost is more than outweighed by the increase in emission reductions, taking into account that the present value of the carbon tax remains constant when assuming a discount rate of 4% per year. The constant tax scenario results in a relatively constant CAV.
The ordering of models in terms of CAV is largely preserved over time and across the increasing and constant tax scenarios. IMACLIM shows CAV values that are multiple times as high as those of the other models. DNE21+, GEM-E3, GCAM, REMIND and WITCH have medium CAV values. IMAGE, MERGE-ETL, MESSAGE, and POLES come in at the low end.

Model characterization based on diagnostic indicators
We have developed a set of diagnostic indicators to characterize the model response to carbon pricing in various dimensions: • Cumulated CO 2 FFI emissions reductions (relative abatement index) • Carbon intensity vs. energy intensity reductions (CoEI indicator) • Structural changes in energy use (primary energy transformation index) 4 • Mitigation costs (cost per abatement value).
In Section 5.1, we check for correlations among these indicators to examine the potential to classify models based on indicator combinations. Section 5.2 presents a preliminary model classification scheme.

Correlation of indicators
A large response of the primary energy mix to a carbon tax will result in a strong reduction of carbon intensity as, in most models, the primary energy mix shifts away from fossil fuels due to the tax. We would thus expect that models with a high primary energy TI are strongly inclined to reduce carbon intensity and exhibit a relatively low carbon over energy intensity (CoEI) values. Everything else being equal, a lower carbon intensity translates to lower emissions and thus higher emissions abatement. Fig. 11 plots the primary energy TI results of this study against the CoEI. Fig. 12 plots the primary energy TI against the relative abatement index (RAI). While a negative correlation between the TI and the CoEI and a positive correlation between the TI and the RAI are clearly confirmed, these correlations are stronger in 2050 than in 2100. This is because even a strong transformation of the primary energy mix toward low carbon energy sources reaches its limits in further reducing carbon intensity and emissionsexcept in GCAM, where the large shift to bioenergy with CCS continues to boost negative emissions as carbon prices increase.
We also investigate the correlation of the CoEI with the RAI (Fig. 13). There is indeed a negative correlation between CoEI and RAI. Higher relative abatement (high RAI) tends to come with a strong inclination to reduce CI (low CoEI). The negative correlation of RAI and CoEI is strongest in situations of a significant carbon tax signal that has not yet pushed the decarbonization to its limits. In the case of a low carbon tax signal, the model response may not be strong enough to induce a clear correlation (see Fig. S2 in the SOM).
The correlation between the abatement response to carbon prices and the inclination to reduce carbon intensity is an important result of our diagnostic analysis. The RAI and CoEI are complementary by construction, since the RAI is related to the residual CO 2 FFI emissions, which in principle can be achieved with a large range of CoEIs. The negative correlation between the RAI and CoEI (Fig. 13), the positive correlation between the RAI and the TI (Fig. 12), and the negative correlation between the TI and CoEI (Fig. 11) suggest that many models show a high/low/high or a low/ high/low pattern for RAI/CoEI/TI.

Model classification
Section 5.1 has identified correlations among the diagnostic indicators on emissions, energy and carbon intensity, and energy system response. This means that their combination might allow us to characterize not only a single model but also a larger group of models. Such a classification would make it easier to identify patterns among the spread of model results in model intercomparison studies that aim to inform policy-relevant questions. What follows is only a preliminary attempt at such a classification that aims to illustrate one important application of model diagnostics. 4 We use the transformation index for primary energy supply to represent the structural changes in the energy system since it shows the more significant changes among the transformation indices discussed in Section 4.3 and is also the most broadly applicable to existing modeling frameworks.

Classification of participating models
In Table 2, we use simple qualitative characterizations of the indicator values: low, high, and medium/mixed categories based on the results shown in Section 4. The medium/mixed category is for those models that fall in between the low and high clusters or that move from one cluster to the other over time. For an overview of the numerical values on which the diagnostic indicators for each model are based, view Table S4 in the SOM.
Some models show indicator values that are fully in line with a low-response vs. high-response classification. A high-response model would be expected to show higher emission reductions, lower carbon intensity relative to energy intensity, and a more decisive transformation of the primary energy mixa high/ low/high pattern with regards to RAI/CoEI/TI. Conversely, low-response models would be expected to show a low/high/ low pattern. This is true for DNE21+, GEM-E3 and WITCH, which can be classified as a low-response model, and for GCAM, MERGE-ETL, REMIND and MESSAGE, which can be identified as high-response models. The indicators of the other four models do not fit the low, high or medium response patterns fully, as they include medium or mixed values. Nevertheless, at least two indicators of most models match the low-response vs.  high-response patterns. For this preliminary classification, we define IMACLIM as a low-response model, IMAGE as a high-response model and AIM-Enduse and POLES as medium-response models based on the largest overlap with the low/medium/high response classes defined in Table 3.
There is no clear correlation between the cost per abatement value (CAV) indicator and the model classification based on RAI, CoEI and TI. This may be due to the fact that high-response models may exhibit both relatively high emission reductions and high mitigation costs that boost both the numerator and the denominator of the CAV. This would indicate that the CAV provides complementary information to the high-vs. low-response model characteristics. Section 4.4 suggests that high CAV values come from the subset of equilibrium models that include the full impact on the economy.
It is worthwhile noting that some of the models classified as low response models in Table 2 are also among the models with a low measure of low-carbon energy supply technology variety, as shown in Table 1. However, a high technology variety measure does not automatically lead to the classification as a high-response model. The measure only covers the supply side, and a high variety of low-carbon technologies does not automatically translate into their large-scale use.

Preliminary classification scheme
From our classification of the participating models, we derive a preliminary classification scheme based on the indicators RAI, CoEI and TI and the model type (Table 3).
We have added the model type to the classification scheme rather than the CAV itself because the model type determines the cost components that can actually be measured. However, this may be changed in the future, or the CAV may be added, as our understanding about the explanatory power of the CAV improves. The CAV definitely provides useful additional information about the magnitude of mitigation costs in climate policy analyses with quantitative mitigation targets. In such a setting, the required emissions reductions are largely fixed, and the low vs. high responsiveness of models will give a good indication about the level of carbon prices that is needed in the models to  reach the target. Thus, the choice of mitigation target and the responsiveness of models largely determine the denominator of the CAV. The indicator itself will then specify to what extent the abatement value translates into mitigation costs. Therefore, highest cost estimates can be expected from general equilibrium models measuring full economic costs with low responsiveness and high CAV (IMACLIM and to a lesser extent GEM-E3 and WITCH). In turn, lowest cost estimates will be expected in partial equilibrium models with high responsiveness and low CAV (IMAGE). We note that models that do not match but are close to the indicator patterns in Table 3 can also be classified based on this scheme. For instance, a model with a low abatement index, a high CoEI and a medium or mixed transformation index can be considered a low-response model.
The derived preliminary classification serves mainly illustrative purposes. Further research and the integration of diagnostic results from more models are needed to establish a robust classification scheme. Ultimately, the value of such a scheme needs to be judged against its ability to characterize differences in model outputs and policy implications in a variety of contexts. As a preliminary test of the model classification, we present results from the AMPERE model intercomparison studies on delayed and fragmented action [6,7] in Fig. 14. These results are from scenarios for atmospheric greenhouse gas concentration targets at levels of approx. 450 and 550 ppm CO 2 e. Fig. 14 shows mitigation costs (in NPV consumption or partial equilibrium costs, discounted at 4% per year) for the 450 and 550 ppm CO 2 eq stabilization cases. Models are colored according to the classification identified above. It can be seen that costs decrease from low-response to highresponse models and are higher for GE than for PE models. As expected, IMACLIM exhibits the highest costs (exceeding the upper boundaries of Fig. 14) and IMAGE the lowest. Thus, our preliminary classification passes this initial test of its explanatory power for the differences in cost estimates. However, the test sample is too small to draw definitive conclusions at this point.
It should finally be noted that the classification that we used here relates to specific versions of the respective models. Updates in model structure or parameters could change the relative position of different models, for instance through the inclusion of new mitigation options.

Summary and conclusion
We have studied and compared the emissions, energy, and economic response to a carbon price signal across 11 global energy-economy and integrated assessment models. The diagnostic study setup consisted of a no-climate-policy baseline and a series of constant and exponentially rising carbon tax scenarios with globally harmonized carbon prices. The study setup can be adopted by global and regional models alike. We found the increasing tax variant to be most useful for identifying diagnostic indicators for abatement response and economic impacts because the present value of   Table 3 (PElow response: Red, PEmedium response: Dark Red; PEhigh response: Green; GElow response: Black; GEhigh response: Yellow). To keep the y-axis at a scale that provides good visibility of most models' graphs, these panels exclude IMACLIM, which shows NPV policy costs of 6-7% for the 550 ppm scenario and 9-10% for the 450 ppm scenario for the periods 2010-2050 and 2010-2100.
a constant carbon tax depreciates over time in the dynamic setting of a growing economy. We have developed four diagnostic indicators to characterize the model responses based on the criteria of characterizing model heterogeneity, relevance for climate policy analysis, applicability to diverse models, and accessibility and ease of use. These indicators are the relative abatement index (RAI), measuring the scale of emissions reductions, the carbon over energy intensity indicator (CoEI) measuring the reliance on carbon intensity vs. energy intensity reductions, the transformation index (TI) measuring the scale of transforming the primary energy mix, and the cost per abatement value (CAV) indicator measuring the mitigation costs as a function of carbon prices and emissions reductions.
A key result of the diagnostic analysis is the identification of strong correlations between the different diagnostic indicators. Models with higher relative abatement (high RAI) tend to also exhibit a stronger reliance on carbon intensity reductions (low CoEI) and a more significant transformation of the energy sector (high transformation indices). At the opposite end, we find models with lower relative abatement, a smaller reliance on carbon intensity reductions (high CoEI) and much smaller changes to the structure of the energy system (low transformation indices). When compared with each other, most models that participated in the diagnostic study fell into one of these two categories. These correlations point to a distinct fingerprint of model structure that emerges in various dimensions.
We used the correlation between diagnostic indicators to establish a preliminary classification scheme and to illustrate potential next steps in the diagnostic work. Establishing and vetting a robust and useful classification scheme will require a community effort involving more models and diagnostic experiments as well as tests in applied contexts. We are hopeful that the findings and suggestions presented here can help encourage the next research steps and advance the discussion about diagnostic standards.