GWP*is a model, not a metric


 To aggregate different greenhouse gases, the UNFCCC common reporting formats, the IPCC inventory guidelines, and the Paris Agreement rulebook use an emission metric that has been around for more than 30 years: the Global Warming Potential (IPCC, 1990). A ‘metric’ establishes an ‘equivalence’ between an amount of CO2 emissions and other greenhouse gases (such as CH4), which can be used to design cost-effective mitigation strategies (Fuglestvedt et al., 2003). As with many established concepts, there has been no shortage of critical views on GWPs (Denison et al., 2019; Plattner et al., 2009;Shine, 2009). More recently, a method that relates emission rate changes of short-lived gases like methane to emissions of CO2 has been suggested, referred to as GWP* (Smith et al., 2012;Allen et al., 2016;Allen et al., 2018;Cain et al., 2019;Smith et al., 2021). This method can usefully approximate the temperature implications of emission time series. Rather mistakenly, though, it has been suggested as an emission metric that can replace the widely used GWPs. The most recent WG1 IPCC report (IPCC, 2021) presents GWP and GWP* both as metrics in the underlying Chapter, although the Summary for Policymakers instead refers to GWP* and related methods as “approaches”. Here, we examine how GWP* falls short on key criteria for a useful emission metric that can usefully be applied in real-world mitigation actions. We show that GWP* can exhibit the wrong sign in terms of the climate effects of a single year of emissions, and that aggregate emissions based on GWP* feature variability which would undermine the stability of any legal framework.


Introduction
To aggregate different greenhouse gases, the UNFCCC common reporting formats, the IPCC inventory guidelines, and the Paris Agreement rulebook use an emission metric that has been around for more than 30 years: the Global Warming Potential (IPCC 1990). A 'metric' establishes an 'equivalence' between an amount of CO 2 emissions and other greenhouse gases (such as CH 4 ), which can be used to design cost-effective mitigation strategies (Fuglestvedt et al 2003). As with many established concepts, there has been no shortage of critical views on GWPs (Plattner et al 2009, Shine 2009, Denison et al 2019. More recently, a method that relates emission rate changes of short-lived gases like methane to emissions of CO 2 has been suggested, referred to as GWP * (Smith et al 2012, 2021, Allen et al 2016. This method can usefully approximate the temperature implications of emission time series. Rather mistakenly, though, it has been suggested as an emission metric that can replace the widely used GWPs. The most recent WG1 IPCC report (IPCC 2021) presents GWP and GWP * both as metrics in the underlying chapter, although the Summary for Policymakers instead refers to GWP * and related methods as 'approaches' . Here, we examine how GWP * falls short on key criteria for a useful emission metric that can usefully be applied in realworld mitigation actions. We show that GWP * can exhibit the wrong sign in terms of the climate effects of a single year of emissions, and that aggregate emissions based on GWP * feature variability which would undermine the stability of any legal framework.
A few recent studies came up with the idea of making the stock pollutant CO 2 comparable with flow pollutants like methane, calling that metric approach GWP * , Smith et al 2021. In its simplest form, as initially presented by Allen et al (2018), the GWP * metric for short-lived forcers can be approximated by scaling GWP by its time-horizon H, So that the standard IPCC AR6 GWP-100 value of 27.2 (Forster et al 2021) for (biogenic) methane becomes a GWP * -100 of 2720. This much increased metric value, so the suggestion goes, is then applied to the change of emissions relative to previous years, rather than to the overall level of emissions. The benefit of this approach is that, when taken in aggregate and considered as a complete timeseries, GWP * emissions are a better predictor of global-mean temperature changes than GWP. Various modifications have been proposed since then, such as applying scenariodependent adjustments by treating methane as 75% stock pollutant and 25% flow pollutant , or scaled versions thereof (Smith et al 2021).
Given the sound theoretical basis (Smith et al 2021) and its usefulness for temperature projections (Lynch et al 2020), why is it that GWP * is nevertheless an inadequate metric for climate change mitigation?
To answer that, let us consider what emission metrics are for. Acknowledging that metrics have different use cases, including technological life cycle assessments and decarbonisation pathway analysis, we elaborate here on previously established 'functions' (Fuglestvedt et al 2003). We focus on what the purpose of a metric is in the context of continuously rolling assessment of annual emission assessments for facilities, sectors, countries or regions (Balcombe et al 2018) (see supplementary table S1 available online at stacks.iop.org/ERL/17/041002/mmedia). The two core 'functions' are to provide an 'equivalence' in terms of climate effects and (thereby) allow a cost-effective achievement of mitigation targets (Fuglestvedt et al 2003, Reisinger et al 2013. Specifically, we ask what features are essential for a metric that is proposed to replace GWP-100 (Lashof and Ahuja 1990) in the context of NDC target setting, inventory emission reporting, Art. 6 purposes and other applications under the Paris Agreement and related regional, national or local mitigation legislation, such as carbon markets. Such a metric should: (a) act like a 'currency converter' or 'exchange rate' so that emissions of various greenhouse gases can be placed on a common scale and aggregated for multi-gas mitigation strategies, enabling a cost-effective mix of mitigation action. In that way, targets can be set for multiple greenhouse gases at once or emission trading systems can encompass multiple gases in so-called 'basket' approaches. (b) approximate the marginal climate effect of an emission action (climate effect of emitting one additional ton of a greenhouse gas, compared to a world in which that emission did not happen), so that a policy framework can appropriately reflect that externality. (c) enable control feedback for policy instruments. For example, annual or quarterly updates of aggregate greenhouse gas (GHG) emission timeseries allow a country to check whether it is on track to meet its emission targets. Likewise, annual accounting of emissions in an emission trading system creates higher or lower prices, depending on whether emissions are high or low with respect to available emission permits. Those annually updated emissions hence constitute a control feedback for mitigation action. Too high variability, i.e. 'noise' , or a too long a lag undermines this control feedback-as e.g. prices within an emission trading system would highly vary from year to year or policy-makers would not be able to discern from the data whether emissions are overall on the right track. (d) be consistent with the existing climate policy environment (ideally). Unless proponents of new metrics suggest overhauling existing reporting systems, emission trading systems, NDC targets and the Paris Agreement itself, a metric should be compatible with existing policy frameworks. (e) provide a simple and transparent tool for nonspecialists, so that a wide range of stakeholders can participate in mitigation actions (such as carbon markets), design them (on a regional and local level) and/or be informed observers of such policies and monitor their progress , Aamaas et al 2013.
The first two purposes are a slight re-arrangement of the previously mentioned 'functions'-with an explicit acknowledgement that for a cost-effective mitigation target, the 'marginal climate effect' must be captured. The third purpose is a fundamental prerequisite from any optimisation or policy framework that is either intrinsically linked to feedback loops (like prices of emission certificates depending on the total emission level) or informally linked via, for example, feedback loops that inform a future commitment period's target in response to aggregate and near-real-time emission estimates. The above list is not exhaustive. Various design choices can be taken when deriving metrics, many of which are also more technical (Tanaka et al 2010). From the proposed five key purposes, GWP * could meet the first one (based on the somewhat problematic long-term assumption that a new emission level is continued for a century, see below), but not the other four. In the following, we first consider the issue of variability (purpose 3), which has so far received little attention in the literature, then consider the consistency with existing climate policy architecture (purpose 5) and discuss the marginal climate effect (purpose 2)-before considering additional aspects.

Variability undermines control feedback
GWP * is ill-equipped to serve as a metric because of the variability in GWP * aggregated emission time series. This variability undermines the usability of GWP * aggregated emissions as a control feedback. All emissions are influenced by some interannual variability, e.g. strong winters tend to increase emissions for heating in northern countries. That's one reason why the Kyoto Protocol had at least a five-year commitment period. If the feedback signal from the subject of control (GHG emissions) is too noisy, it is very tough to build a control mechanism. That's no different in signal control in electrical engineering as it is in climate policy.
To illustrate the issue, let us consider New Zealand, a developed country with well-developed emission inventories whose 1990-2018 emission trend happens to be almost the same when using GWP or GWP * (a −2% change when using GWP * calculated with 20 year averaged CH 4 time series, see figure 1(a) and supplementary). Yet, when using GWP * for emissions aggregation, the variability from year to year turns out to be enormous. In one year, GWP * aggregate emissions skyrocket and in a few other years a New Zealand government could theoretically claim to have reached net-zero emissions already (grey bands in figure 1(c)). This is not a reliable framework in which a clear signal leads to emissions being ramped down over time. To address this variability somewhat, Allen et al (2018) suggested not to take the emission difference to the previous year, but to emissions 20 years earlier and using only a 20th of the GWP * value (GWP × 100/20). Mathematically, that is equivalent to applying a 20 year rolling average over the methane emission time series (i.e. assigning year t the average of emissions from year t-19 to t) and taking the difference from the previous year. It turns out that, even then, the five-yearly or decadal variability of emission timeseries is much higher in GWP * -aggregate time series (pink dashed line in figure 1(c)) than under GWP-aggregate time series (blue solid line in figure 1). Even using this 20 year rolling average, New Zealand would have reported a 38% (using the Allen et al formula) or 29% (using the Smith et al version) decline in the three years from 1986 to 1989 for total GWP * -weighted emissions (at a time when GWPweighted emissions rose 1.5%). Similarly, New Zealand would have reported strong emission increases in the beginning of the 1990s and a 20%-26% decline over the 4 years from 2012-2016 (figure 1(c))simply because of small variations in CH 4 emissions ( figure 1(b)) (with the 20%-26% range again being the result of changing between the Smith et al (2021) or Allen et al (2018) methods). Given that relative trends and variations in CO 2 emissions related to transport, residential heating, industry and electricity supply are an order of magnitude smaller than these GWP * -weighted emissions variations, amplifying CH 4 -related variations via the use of GWP * would undermine any multi-gas policies and targets.
The fundamental reason for the noise is that GWP * makes a long and daring forward projection: every wiggle in the annual emission time series of short-lived forcers is implicitly projected to last for the next century-annual wiggles are thus amplified directly by a factor of 100. While this feature of equating flow pollutants with stock pollutants is very useful for making global-mean temperature projections, it is a major bug for policy instruments. If a policy instrument attempts to control something that is inherently variable, the policy subject (in this case the country with its GHG emissions) jumps from being noncompliant with the targets in some years to overachieving any target in the next.

Inconsistency with existing NDCs, the Paris Agreement and emission trading systems
Aggregating historical CO 2 , methane and N 2 O emissions from 191 countries with both GWPs and GWP * metrics reveals how different the aggregated GHG time series are ( figure 1(a)). For example, while the nominal range of Kyoto Protocol targets for the 2013-2020 period is from 78% (Monaco) to 99.5% (Australia) relative to 1990 (grey bar in figure 1(a)), GWP * aggregate metrics provide very different emission trends to what countries are used to. The trends are not 0.5% up or down, not even 15% up or down. For many countries, historical GWP * -aggregate emissions changes between 1990 and 2018 are 50%, 100% or even higher than the same trends calculated with GWP. Those wildly different emission trends are an enormous departure from any climate policy setting we know. In fact, under GWP * , the NDCs formulated for the future by 191 countries would have to be redone. Put simply, GWP * would ask countries to start from scratch in terms of their political target setting processes: a bold ask to policy makers.
Whether or not we agree with the 'inadvertent consensus' that the simple GWP metric has enjoyed since 1990 (Shine 2009), the fact is that hard-wrung policy architectures are now in place. The theoretical benefits of any GWP-alternative must be weighed against the political capital required to overhaul existing policies and targets. As a further example, GWP * would sit at odds with the carefully calibrated language of both Art. 2 and Art. 4.1 of the Paris Agreement. The Paris Agreement Art. 2 includes the possibility for a slight decrease of temperatures to pursue best efforts to limit warming to 1.5 • C after they have been kept 'well below' 2 • C, as both temperature levels can be understood to be a single goal (Rajamani and Werksman 2018). In addition, Art. 4.1, which asks for a balance of anthropogenic emissions and anthropogenic sinks, would produce such a slight cooling over time when GWPs are used (Tanaka and O'Neill 2018). Under a GWP * definition, however, net-zero emissions would only lead to a stabilization of temperatures at their peak level (Fuglestvedt et

Wrong sign for marginal climate effects
GWP * weighting of emissions could even feature the 'wrong sign' . Interestingly, some proponents of GWP * claim the opposite, i.e. that GWP results in the 'wrong sign' for warming under some circumstances .
Here we examine this claim. As discussed above, metrics should approximate the marginal climate effect of a given emission. As a result, the relevant questions are (or should be), 'If I emit this ton of substance X, how much more or less warming do I cause compared to a world in which I had not emitted anything?' And secondly, 'how does emitting this ton of substance X compare to emitting a ton of CO 2 (with the impact of CO 2 also being compared to a world in which no additional emissions would have occurred)?' Answering these questions without metrics requires three experiments: a reference experiment ('reference'), an experiment with an extra unit emission of substance X ('substance X increase') and an experiment with an extra unit of CO 2 emissions ('CO 2 increase'). The marginal warming caused by substance X is then the difference between the 'substance X increase' experiment and the 'reference' experiment. Similarly, the marginal warming caused by CO 2 is the difference between the CO 2 increase experiment and the reference experiment. Comparing the marginal warming of X and CO 2 allows us to derive metrics, such as GWP or a global temperature potential (GTP). It is impossible for emissions of any warming agent, such as methane, to be associated with cooling or reduced warming when the focus is on the marginal contribution to climate change. Thus, any GHG metric value should have a positive sign. Of course, key value judgements (e.g. whether 20 or 100 years, radiative forcing or temperatures, integrated values or end points are considered) determine the magnitude of the metric values, but not the sign.
The only possible way to conclude that GWP results in the 'wrong sign' is to ask a different question i.e. to not focus on marginal warming. The question which proponents of GWP * implicitly ask instead is: 'Given emissions of all species over all time, can I find a CO 2 pathway which leads to the same warming?' This is obviously an interesting question (one which reduced complexity climate models are very wellplaced to answer) (Wigley 1998). However, it is inappropriate for assessing the climate impact of a single year or, say, '5 year long commitment period'-worth of emissions because it folds the decreasing temperature contribution from past flow pollutant emissions into the effect of the emissions of interest. Notably, for a stakeholder with high historical CH 4 emissions, and somewhat lower current CH 4 emissions, the waning temperature effect of the past will dominate the additional warming from current emissions. As a result, they are considered net negative in the GWP * -framework. Yet current emissions still warm the planet compared to what would have happened without those emissions. Metrics should reflect this marginal/additional warming. Instead, GWP * folds the waning effect of past emissions into metric-style assessment of the impact of future emissions. While GWP * is well-suited to assessing the temperature effect of time-series of emissions (hence is an excellent model), it is ill-suited as an emission metric to approximate the marginal climate effect of GHG emissions from a particular year or, say, a 5 year long commitment period.

Lack of metric neutrality, perverse incentives and a range of other shortcomings
For comparing the climate effect of the same absolute level of emissions, it should be irrelevant in which country, sector or facility these emissions occur. Such a 'metric neutrality' is desirable in an international policy setting, especially for greenhouse gases with lifetimes beyond approximately a year as their climatic effect does not depend on where the emission occurred. Whether to assign higher or lower emission allowances to a certain country or project because of its historical emission profile is the role of the policy framework and target setting process, but not the metric . GWP * however is not a 'neutral' metric as it weighs emissions differently depending on what the emission history of the country, project or facility has been.
There is one other aspect that proponents of GWP * seem to be divided upon. While some seem to suggest that GWP * would be a way to elevate the importance of CO 2 emission reduction more than is the case currently under GWPs (Pierrehumbert 2014 the practical effect of increasing the metric value of CH 4 by a factor of 5 (effective metric value for the 20 year averaged time series) or 100 could do the opposite. More emphasis could be placed on CH 4 mitigation in the near-term and some proponents even suggest that GWP * would negate any need for emission reductions of CO 2 : 'New Zealand could declare itself climate neutral almost immediately, well before 2050, and only because farmers were reducing their methane emissions' (Cain 2019).
Let us take GWP * for what it is: A new class of 'micro climate models' (MCMs) that should be welcomed in the hierarchy of climate models. There are now GWP * and the combined global temperature change potential (CGTP) (Collins et al 2020) formulas, which open the door for educational tools and various applications, if quick temperature projections are required from time series of emissions. And let us also be clear what GWP * and other so-called steppulse metrics are not: metrics.

Data availability statement
The data that support the findings of this study are openly available at the following URL/DOI: https://zenodo.org/record/4479172#.YS23j9MzaBc.