Is the nitrogen footprint fit for purpose? An assessment of models and proposed uses

The nitrogen footprint has been proposed as an environmental indicator to quantify and highlight how individuals, organizations, or countries contribute to nitrogen pollution. While some footprint indicators have been successful in raising awareness of environmental pressures among the public and policy-makers, they have also attracted criticism from members of the life cycle assessment (LCA) community who find some footprints confusing and misleading as they measure substance and energy flows without considering their environmental impacts. However, there are also legitimate reasons to defend footprints as a useful class of indicators despite their incompatibility with LCA principles. Here, in light of this previous research and debate, we critically assess models and proposed uses for the nitrogen footprint, and explore options for further development. As the nitrogen footprint merely quantifies gross nitrogen emissions irrespective of time, location, and chemical form, it is a crude proxy of environmental and health impacts compared to other, more sophisticated environmental impact indicators. However, developing the nitrogen footprint toward LCA-compatible impact assessment would imply more uncertainty, more complexity, and more work. Furthermore, we emphasize that impact assessment has an unavoidable subjective dimension that should be recognized in any development toward impact assessment. We argue that the nitrogen footprint in its present form is already fit for some purposes, and therefore further development towards impact assessment may be unnecessary or even undesirable. For some uses it seems more important that the footprint has a clear physical meaning. We conclude that the best way forward for the nitrogen footprint depends crucially on what story it is used to tell.


Introduction
The nitrogen footprint has been proposed as an indicator to quantify and highlight how individuals, organizations, or countries contribute through their consumption to nitrogen pollution and thereby to impacts on the environment and human health.It is most commonly defined as the "total amount of N r [reactive nitrogen, all other forms than N ] 2 released to the environment as a result of […] consumption" (Leach et al., 2012) and thus it is a proxy for the many and interrelated potential environment and health impacts of nitrogen pollution (Galloway et al., 2003;Sutton et al., 2011;Erisman et al., 2013).The nitrogen footprint may prove successful in raising awareness among consumers and decision-makers, not least due to its seeming simplicity and its catchy name, familiar from more well-known siblings like the ecological, carbon, and water footprints.In fact, it has been suggested as an important member of the "footprint family" (Galli et al., 2012;Leach et al., 2012), a combination of different footprints intended to measure impacts more comprehensively than any single indicator could.
However, footprint indicators have been criticized from the field of life cycle assessment (LCA) for failing to give relevant and comprehensive information about environmental impacts (Ridoutt et al., 2015a, b).A central principle in LCA is to distinguish the life cycle inventory (LCI) from the life cycle impact assessment (LCIA): the LCI maps substance and energy flows while the LCIA quantifies resulting impacts on the environment or human well-being, ideally using a comprehensive but non-overlapping set of impact indicators that allows the audience to assess trade-offs between different types of impacts.The LCA community has contended that some footprints look more like inventory results than impact indicators and consequently risk to confuse and mislead their audience.For example, the water footprint fails to reflect actual impacts if it aggregates water use in different locations without accounting for regional variation in water scarcity (Pfister and Hellweg, 2009;Ridoutt and Huang, 2012).The water footprint has also been criticized for conflating different types of impacts when it sums rainwater, pumped irrigation water, and even hypothetical volumes of polluted water into one number (Ridoutt and Huang, 2012;Pfister and Ridoutt, 2014).A related problem from an LCA perspective is the double-counting that occurs when indicators have overlapping scopes.For example, the ecological footprint accounts for some greenhouse gas emissions, so that reporting it alongside the carbon footprint leads to double-counting (Galli et al., 2012;Ridoutt and Pfister, 2013b).Thus, judging by LCA standards, footprints can look like a "minefield" (Ridoutt et al., 2015a) of incoherent definitions, overlapping scopes, and limited environmental relevance (Ridoutt and Pfister, 2013b;Fang and Heijungs, 2015a).
Despite these perceived shortcomings, footprints have gained much popularity, and the LCA community has responded by trying to formalize and define footprints that conform to LCA principles, yet appeal to the intended audience of existing footprints (Ridoutt and Pfister, 2013b;Fang and Heijungs, 2015b).Specifically, a task force from the UNEP-SETAC Life Cycle Initiative has suggested that footprints can be allowed to have overlapping scopes since they aim to address the concerns of a non-technical audience, rather than support a comprehensive assessment of impacts and trade-offs (Ridoutt et al., 2015b).Nevertheless, the task force argued that footprints can and should build on the same life cycle perspective as traditional LCA indicators do, and specifically recommended a set of common ground rules for all footprint indicators to follow (Ridoutt et al., 2015a).Thus, the task force welcomes a set of footprint indicators to complement traditional LCA indicators, however not to be used for comprehensive impact assessment, and only provided that they conform to certain LCA principles.
This attempt to reconcile LCA and footprinting does not settle the debate, though.There is a considerable audience in both research and policy communities who use and support footprint indicators that are incompatible with LCA principles; sometimes perhaps due to ignorance of the potential pitfalls, but there are also sound arguments why LCAincompatible footprints could be preferable (Fang and Heijungs, 2014;Fang et al., 2016;van Dooren et al., 2017).For example, the original water footprint proponents have replied with fundamental objections to the LCA approach: that it ignores certain important perspectives in its construction of impact indicators, that the LCIA-compatible water footprint lacks physical meaning, and even that LCA methodology might be aiming for the impossible when aggregating different impacts into a single number (Hoekstra et al., 2009;Hoekstra and Mekonnen, 2012a;Hoekstra, 2016).As we will further expand on in this paper, disagreement over footprint indicators has partly originated in technical details, but perhaps even more in different ideas about what meaning and purpose the footprint should have.
Therefore, as the nitrogen footprint is put forth as a tool to analyze and inform about nitrogen pollution, we find it timely to critically assess models and proposed uses for the nitrogen footprint and to do so in light of previous footprint research.Several recent papers have reviewed and discussed the various ways footprints can be defined and used (Fang and Heijungs, 2015b;Fang et al., 2016;Laurent and Owsianiak, 2017), and this research suggests that much conflict and confusion is the result of methods and purposes that are poorly defined or poorly aligned.In this paper, we aim to (1) recapitulate some of the arguments surrounding other footprints, focusing on the carbon and water footprints, and discuss whether and how these arguments apply to the nitrogen footprint; (2) review models and proposed uses for the nitrogen footprint; and (3) evaluate how well current nitrogen footprint models are aligned with proposed uses.

Method
The nitrogen footprint is a recent invention compared to the more established carbon, water, and ecological footprints.These other footprint indicators have attracted much debate over their meaning, purpose, and usefulness.We expected that some lessons learned from these debates also would apply to the nitrogen footprint.Therefore we studied literature on the relationship between LCA and footprints, on the footprint family, and on specific examples from the carbon and water footprints.We paid special attention to the concept known in LCA as environmental relevance, which has been a central point of contention.Regarding the relationship to LCA and the footprint family, we studied especially the following publications: Galli et al. (2012); Fang et al. (2014Fang et al. ( , 2015)); Fang and Heijungs (2015b); Fang et al. (2016); Ridoutt and Pfister (2013b); Ridoutt et al. (2015a, b); Laurent and Owsianiak (2017).Regarding the water footprint, we found the following publications useful as they clearly demonstrate different views on the meaning, purpose, and usefulness of the water footprint: Pfister and Hellweg (2009); Hoekstra et al. (2009Hoekstra et al. ( , 2011)); Hoekstra and Mekonnen (2012a); Ridoutt and Huang (2012); Champaign and Tickner (2012); Boulay et al. (2013); Chenoweth et al. (2014); Pfister and Ridoutt (2014); Hoekstra (2016); Pfister et al. (2017).Regarding the carbon footprint we specifically studied how it aggregates greenhouse gases with different atmospheric lifetimes (Shine, 2009;Persson et al., 2015;Ridoutt et al., 2015a;Frischknecht and Jolliet, 2016;Reisinger et al., 2017).What we present in this paper is by no means a review, but a selection of issues we found relevant for the nitrogen footprint.
We studied literature on nitrogen footprints with two specific aims: (1) to compare models used to calculate footprints and ( 2) to map proposed uses and how these uses have been discussed.We compared the models with respect to system boundaries, scope of activities, modeling approach and assumptions, data requirements, and whether and how results were disaggregated.We scanned the literature for proposed uses of the nitrogen footprint and also for viewpoints on potential uses and limitations.
Guided by the lessons learned from other footprints, we critically assessed how well suited the nitrogen footprint is to its proposed uses.We found that each proposed use is implicitly associated with a different story, conceptualized as a combination of meaning (what the nitrogen footprint represents) and purpose (to what end it is suited).We then considered how these different stories put different requirements on the nitrogen footprint, and how well it lives up to these requirements.
We studied publications that referred to the nitrogen footprint as first proposed by Leach et al. (2012).A few useful overviews on models and results already exist (Galloway et al., 2014;Shibata et al., 2017;Erisman et al., 2018).Our aim is not to supplant these, but to provide additional perspectives that we find missing or insufficiently explored.The syntheses by Galloway et al. (2014) and Shibata et al. (2017) mainly focused on results of nitrogen footprint studies and the paper by Erisman et al. (2018) mainly on comparing nitrogen footprints to other nitrogen pollution indicators.In contrast, we focused on comparing models and proposed uses within the set of studies reporting nitrogen footprints.

Results and discussion
This part of the paper is organized as follows.Section 3.1 describes lessons learned from other footprints, with a special focus on the concept known in LCA as environmental relevance.We describe what environmental relevance is and show how it relates to different views on the appropriate meaning and purpose of a footprint indicator.In subsection 3.1.3we discuss how the lessons learned apply to the nitrogen footprint.Keeping these lessons in mind, we then compare a set of nitrogen footprint models (Section 3.2) and critically assess their proposed uses (Section 3.3).

The criterion of environmental relevance
One of the general criteria for footprints proposed by the footprinting task force of the UNEP-SETAC Life Cycle Initiative (Ridoutt et al., 2015a, b) is that footprints be environmentally relevant.Environmental relevance is a core idea in LCA, explained by the task force as follows: "When aggregating data, having common units is necessary, but not sufficient; environmental equivalence is needed.To illustrate, it would not be environmentally meaningful to aggregate emissions of different greenhouse gases without first applying factors […] describing the relative global warming potentials.Similarly, to assess the environmental performance of consumptive water use along a supply chain it is necessary to apply a model which accounts for differences in local water availability."(Ridoutt et al., 2015a) Environmental relevance is so central because it is needed to create "a consistent logic whereby a smaller value is always preferable to a higher value" (Ridoutt et al., 2015a).To be compatible with LCA, a footprint should give sufficient information to choose between two products, with respect to the topic that the footprint concerns.
The water footprint illustrates how some footprints are at odds with the environmental relevance criterion.The common idea of a water footprint quantifies three "colors" of water: green water which is consumption of rainwater, blue water which is consumption from surface water and groundwater, and gray water which is a hypothetical water quantity needed to dilute pollutants to acceptable concentrations.The water footprint can also be spatially and temporally disaggregated (Hoekstra et al., 2011), but in practice it is often reported as a single number.From an LCA perspective this may look like a failed attempt to make an impact indicator: Ridoutt and Pfister (2010) wrote that "it is not clear what good would result from choosing a product or production system on the basis of it having a lower water footprint.Indeed, a product with a lower water footprint could be more damaging to the environment than one with a higher water footprint depending upon where the water is sourced."A better solution from an LCA perspective (Ridoutt and Pfister, 2010) is a water footprint that doesn't add up green and blue water use and that accounts for water scarcity, and thus looks more like a proper LCIA indicator.

Environmental relevance is not necessarily the aim
But outside the LCA community the absence of scarcity correction is not always seen as a mistake.While Hoekstra and Mekonnen (2012a) have acknowledged in that "reducing the aggregate WF [water footprint] in environmentally stressed catchments deserves priority", they also emphasized that priorities need to be formulated with "a variety of considerations, including local environmental impact, global sustainability, equity, and economic efficiency".The water footprint can be seen an aid in those judgements, especially when it is disaggregated spatially (Hoekstra and Mekonnen, 2012b), temporally (Hoekstra et al., 2012), and by "color" (Hoekstra, 2016).This reflects a qualitatively different view of the footprint's purpose: not to conclusively decide which products or production systems are preferable, but to illustrate how, where, and for whom water resources are used.
In addition Hoekstra (2016) has questioned in a well-articulated and detailed paper whether it is even possible to summarize in one number the range of effects on human well-being that result from depletion and pollution of water resources, in different locations and at different times, concluding that "the LCA methodology may run against the limits of what is possible, given the complexity of the socio-ecological system."An equally well-articulated and detailed reply from Pfister et al. (2017) addressed both parts of the critique, partly by arguing that LCA better achieves what Hoekstra is looking for, and partly by arguing that Hoekstra is looking for the wrong thing.Without going into the excruciating detail of the arguments we conclude that much of the conflict is rooted in diverging ideas about what meaning and purpose the water footprint should have.
The issues of spatial and temporal variability in scarcity and the qualitative differences between the three water colors apply by analogy to the nitrogen footprint.For nitrogen, there is a corresponding spatial and temporal variability in impacts depending on where and when emissions occur, and qualitative differences in the impacts of different chemical forms.This is further discussed in Section 3.1.3.

Environmental relevance is subjective
There is no such thing as objective environmental relevance since there is no objective comparison of impacts in one time or location against another.The lack of objective relevance is even more apparent for indicators that concern a set of qualitatively different impacts.For example, it has been debated whether and how the water footprint should aggregate depletion and degradation of water resources into a single number (Ridoutt and Pfister, 2013a;Hoekstra, 2016), and similar problems arise with the nitrogen footprint, which aggregates nitrogen pollution of different chemical forms, in different locations, and at different times.
Therefore, any work towards environmentally relevant indicators is problematic if it overlooks the issue of embedded value judgements.Value judgements are necessary for environmental relevance since the requirement that "a smaller value is always preferable to a higher value" (Ridoutt et al., 2015a) presupposes a meaning of "preferable".But nothing is preferable in its own right; preferences are held by someone.Unfortunately it is not often stated whose preferences are to be represented, or why.The UNEP-SETAC footprint task force claims that the focus of a footprint is defined by stakeholders in society and emphasizes that the footprint should correspond to the expectations and language of those stakeholders (Ridoutt et al., 2015b).However, at least in the case of the water footprint it appears to be primarily a debate between scientific experts that determines the scope of the footprint.
The carbon footprint illustrates that environmental relevance is not guaranteed even if there is objective equivalence in some technical sense.Although it is widely accepted to report carbon footprints aggregating different greenhouse gases with different lifetimes using their 100-year global warming potential (GWP100), and although this practice was used as an example of environmental relevance by Ridoutt et al. (2015a), it must be recognized that these units are impact equivalents only in a limited technical sense.Each greenhouse gas has different dynamics in the atmosphere and therefore different effects over time.There are several other greenhouse gas metrics that are equally "correct" from a scientific standpoint, yet can suggest different priorities for climate change mitigation (Shine, 2009;Persson et al., 2015;Reisinger et al., 2017).For example, the GTP100 metric, which instead measures temperature change after 100 years, has rather different equivalence factors for the most important greenhouse gases.Indeed, the wide adoption of the GWP100 metric is seemingly an "inadvertent consensus" brought about by a sort of scientific and political convenience (Shine, 2009).As this consensus has been challenged, the UNEP-SETAC Life Cycle Initiative has recently suggested always using both the GWP100 and GTP100 metrics in order to give a more nuanced picture of climate impacts (Frischknecht and Jolliet, 2016).These examples demonstrate that it is ultimately an arbitrary choice how to aggregate qualitatively different impacts in a single indicator.

Implications for the nitrogen footprint
What lessons learned from other footprints can we apply to the nitrogen footprint?We find support at least for the following three points.
First, if the nitrogen footprint is to be used as an LCA-compatible impact indicator, it should incorporate existing research on the many and qualitatively different impacts of nitrogen pollution.A nitrogen footprint defined as "the total amount of [reactive nitrogen] released to the environment" (Leach et al., 2012) aggregates nitrogen emissions in different locations, at different times, and of different chemical forms.One scientific reason to make such aggregation could be the high mobility and reactivity of nitrogen in the environment, which means that a single nitrogen atom over time may contribute to a range of environmental issues, far from the time and place it was originally released (Galloway et al., 2003;Sutton et al., 2011;Erisman et al., 2013).But despite this complexity, much progress has been made towards prediction of how the chemical form, location and timing of nitrogen emissions determine impacts.For example, the large spatial variation in nitrogen retention (denitrification to NO x , N O 2 , or unreactive N 2 , and long-term storage in the environment) has been subject to much research and is an important factor to consider when designing policies to control eutrophication and climate change (Grizzetti et al., 2015;Hansen et al., 2017).Timing and chemical forms also matter, for example as seen in simulations of how agricultural ammonia emissions in continental Europe contributed substantially to an intensive episode of harmful airborne particulate matter in the UK (Vieno et al., 2016).The LCA community is working to make spatially differentiated impact indicators relevant for nitrogen pollution, for example the acidification potential (Roy et al., 2014) and the eutrophication potential (Henryson et al., 2017).If the aim is to develop the nitrogen footprint towards impact assessment, it seems constructive to make use of these efforts.
Second, since nitrogen contributes to different impacts in different times and locations, if the aim is to reach environmental relevance it needs to be clarified whose values and priorities are to be represented in the footprint.There may be several concrete ways forward that stakeholders find acceptable.One possible approach is to weigh different impacts based on expert opinion, another to elicit weighting factors from a broader audience, e.g., based on economic valuation of impacts (for nitrogen-specific examples, see, e.g., Brink et al., 2011;Compton et al., 2011;van Grinsven et al., 2013van Grinsven et al., , 2018)).But whatever path is chosen, it must be recognized that environmental relevance is inherently subjective, and therefore the formulation of an impact indicator will always partly be an act of persuasion.The most stringent requirement on environmental relevance one can hope to meet is that an indicator to most stakeholders is reasonably representative of their concerns.
Third, it is by no means necessary to aim for environmental relevance.For the purposes of drawing attention to pollution and resource consumption and illustrating the central role of consumption choices, the ecological and water footprints have clearly been successful despite concerns over lacking environmental relevance (Chenoweth et al., 2014;Fang et al., 2014;Hoekstra and Wiedmann, 2014).An explicit non-goal of environmental relevance may even be preferred; for example, if one does not believe in the mere possibility of an empirically robust relationship between the nitrogen footprint and human well-being or ecosystem quality, it may be better to have an indicator that at least has a clear physical interpretation (Fang and Heijungs, 2015a;Hoekstra, 2016).From this perspective, perhaps the best possible nitrogen footprint is an inventory of emissions disaggregated spatially, temporally, by chemical form, and possibly in other physically meaningful ways.

Nitrogen footprint models
As shown in the previous section, the intended meaning and purpose of the nitrogen footprint plays a crucial role in determining how it should be defined.Fig. 1 illustrates that different definitions can have different proximity to the impacts that humans care about.For example, a nitrogen footprint defined as the total amount of reactive nitrogen released to the environment is rather far removed from the impacts.This may be more or less of a problem depending on the intended meaning and purpose of the indicator.This perspective guided our comparison of the different nitrogen footprint models (this section) and our critical assessment the proposed uses of the nitrogen footprint (Section 3.3).
The nitrogen footprint models we studied vary widely in approach, system boundaries, scope of activities, data requirements, disaggregation level, assumptions, and not surprisingly also in results.These method choices determine how work-intensive the calculation is, how much data is needed, and how uncertain the results are.These differences are summarized in Table 1 and in the following paragraphs.As Table 1 shows, each model involves a long list of choices to be made, sometimes quite subtle ones.A downside of these differences is that they limit the comparability across studies, but the upside is that they explore and quantify the importance of method choices.Especially useful are those studies that carry out consistency checks and sensitivity analyses by varying assumptions or data sources (Leach et al., 2012;Leip et al., 2014b;Shibata et al., 2014;Oita et al., 2016b).
Two main approaches to calculate footprints are (1) bottom-up, i.e., aggregating emissions calculated using activity data (food consumption, fossil fuel combustion, etc.) and estimated emission factors; or (2) top-down, i.e., disaggregating statistics on nitrogen turnover as far as possible between different economic sectors.Due to lack of data and other difficulties, most studies have used some combination of these two approaches (see also Galloway et al., 2014;Shibata et al., 2017;Erisman et al., 2018).A notable example of a hybrid approach is the N-Calculator, launched by Leach et al. (2012) and applied with various adjustments by others (Pierer et al., 2014;Stevens et al., 2014;Shibata et al., 2014Shibata et al., , 2017)).The N-Calculator estimates most emissions using a bottom-up approach-for example, agricultural emissions based on fertilizer recommendations, typical crop yields and livestock feed rations, etc.-but adds in a top-down estimate of nitrogen emissions related to goods and services based on an input-output model.This approach makes sense considering its purpose, namely to let individuals quantify their personal footprint, as it enables a fine-grained breakdown of the most substantial nitrogen emission sources.However, it lacks consistency and comprehensiveness compared to, e.g., the top-down approach by Oita et al. (2016a), covering 188 countries using a multiregional input-output model.
System boundaries and scope of activities vary considerably between the surveyed models.For example, Gu et al. (2013) covered all major nitrogen emissions within China but no emissions outside China, i.e., they took a national production perspective on emissions.In contrast, Oita et al. (2016a) took a strict consumption perspective, mapping the global nitrogen emissions associated with all consumption in each country.The N-Calculator (Leach et al., 2012) in principle maps the nitrogen emissions associated with an individual's consumption, but due to method and data limitations in practice does not account for emissions related to imported products.An improved variant of the model was proposed by Shibata et al. (2014) to account for Japan's large food and feed trade, and more generally it is envisaged that the N-Calculator could use a multi-regional input-output model to fully account for trade-related emissions and thereby reach the ultimate goal of connecting consumer choices to nitrogen pollution (Leach, A., personal communication, April 2018).Other models cover only certain products but in greater detail: for example eleven major food categories in EU countries (Leip et al., 2014b), organic and conventional milk in Sweden (Einarsson et al., 2018), and seven categories of seafood in Japan (Oita et al., 2016b).
As concluded in Section 3.1.3,disaggregating the nitrogen footprint spatially, temporally, or by chemical form can be a useful step regardless of whether the aim is an LCA-compatible impact indicator.Some of the studied models already disaggregate nitrogen emissions (see Table 2).An example of spatial disaggregation is the multi-regional input-output model by Oita et al. (2016a), which estimates not only the quantity of nitrogen emissions caused by each country's consumption but also in which country the emissions occur.Thus it could be shown that many countries have substantial net trade of embodied pollution compared to their total footprint.There are also a few examples of disaggregation by chemical form or loss pathway, for example distinguishing leaching and runoff (mostly NO ) 3 from gaseous losses of NH 3 , N O 2 and NO x (Leip et al., 2014b;Oita et al., 2016a;Guo et al., 2017).However, temporal disaggregation below annual level seems yet to be missing in the literature, although this could be useful information given the episodic character of some pollution problems like airborne particulate matter (Vieno et al., 2016).
We finally mention two technical details where the surveyed models differ.The first is that we found several different and sometimes poorly described methods for handling systems that co-produce multiple products (Table 1).Ideally, the chosen methods (e.g., allocation or system expansion) should at least be clearly defined since they play an important role in determining the footprints of some products (Weidema et al., 2008;Weidema and Schmidt, 2010;Pelletier et al., 2015;Baldini et al., 2017).
The second detail is nitrogen-specific, namely that denitrification to unreactive N 2 gas is treated in different ways.Note that the following discussion does not apply to partial denitrification to NO x or N O 2 , as the nitrogen then remains reactive, moving through the nitrogen cascade (Galloway et al., 2003).This discussion concerns N 2 denitrification which can be counted as a negative emission of reactive nitrogen (Leach et al., 2012).While everyone seems to agree that N 2 emissions from wastewater treatment plants must not be counted towards the nitrogen footprint, there are different approaches to the N 2 denitrification occurring in agricultural soils (Table 1).The different approaches may be defended using different perspectives on system boundaries: since the nitrogen footprint is defined in terms of nitrogen "released to the environment" (Leach et al., 2012), it matters how one defines the boundary between the environment and the non-environment.For example, wastewater treatment is reasonably not seen as part of the environment and therefore denitrification in wastewater treatment is seen as avoided emissions.In contrast, N 2 denitrification in rivers and lakes clearly occur in the environment and therefore it seems agreed that such natural denitrification should not be deducted from nitrogen footprints.But N 2 denitrification in agricultural soil is somewhere in between: it may be seen either as an avoided stream of nitrogen pollution (like in wastewater treatment) or as a natural removal process (like in rivers and lakes).Most of the surveyed papers do not discuss denitrification much at all-for example, the word does not even occur in the agriculturally oriented paper by Einarsson et al. (2018)-but we think this issue deserves more attention.

Demonstrating the importance of diets and consumption
The main purpose of the N-Calculator (Leach et al., 2012) was "to help consumers understand their role in nitrogen losses to the environment", but the authors also pointed to a broader audience of "the public, policymakers, and governments".They pointed out that the nitrogen footprint is merely an estimate of total nitrogen pollution, not a measure of actual environmental impacts: "[t]here is […] a wide variation in […] environmental consequences, which are determined by the way in which the N r is lost: to the air (as NH 3 , NO x , or N O 2 ) or to the ground and surface water (as NH 4 or NO 3 ).Depending on the loss route and form, the [nitrogen] will have different consequences to the environment."Their list of proposed uses was correspondingly modest and can be summarized as educating the public and policy-makers by providing order-of-magnitude information about the nitrogen pollution associated with different categories of goods and services.This use case has also been promoted and demonstrated by several others (e.g., Leip et al., 2014b;Galloway et al., 2014;Westhoek et al., 2015).
Comparing food categories (e.g., dairy products, beef, chicken, legumes and vegetables) in terms of nitrogen footprint is an exercise with striking results: there may well be a whole order of magnitude difference between different categories.While the pollution may occur in different locations and at different times, at least the composition of different chemical species (nitrate, ammonia, nitrogen oxides) seems broadly similar between different categories of food (Leip et al., 2014b).Hence, it would not be unreasonable to think that some food categories really are an order of magnitude "worse" than others, almost irrespective of what meaning is put in the word "worse".In other words, no complicated impact assessment model is needed to support the order-of-magnitude message about the relative importance of different food categories iterated by many researchers (Leach et al., 2012;Leip et al., 2014b;Pierer et al., 2014;Galloway et al., 2014;Westhoek et al., 2015;Shibata et al., 2017).

Food product labeling
As a next step towards concrete consumer information, Leach et al. (2016) have proposed "a comprehensive environmental impact food label that assesses a food product's sustainability in terms of its energy, nitrogen, and water use" using a combination of the carbon, nitrogen, and water footprints.Such a label could convey quite different messages depending on how the footprint is presented.Here are a few examples of design choices highlighted by Leach et al.: • The label can either report numerical values or translate the numerical values into an ordinal scale, such as a four-step "stars label" (0-3 stars) or a three-colored "stoplight label".Leach et al. noted that this choice affects the perceived complexity of the label.
• The label can present the information either in absolute terms or relative to some benchmark.Reporting in absolute terms, i.e., the footprint value, is straightforward but leaves the difficult task of interpretation to the consumer.In contrast, reporting in relative terms, Fig. 1.Examples of possible indicators along the causal chain from emissions to impacts.While moving closer to impacts may seem attractive to increase the relevance of the indicator, it must also be acknowledged that this requires a range of scientific questions to be solved and a range of subjective decisions to be made regarding what counts as relevant.Note that the figure merely lists some examples; a comprehensive treatment would be vastly more complicated (Sutton et al., 2011;Erisman et al., 2013).

Table 1
Summary of nitrogen footprint models in system boundaries, scope of activities, modeling approach, data requirements, disaggregation level and assumptions.See text (Section 3.2) for details.No No (outside system boundary)

No
[1] Food part calculated assuming 100% domestic production.Direct energy and transport emissions based on average per-capita consumption.Indirect energy and transport emissions based on average per-capita domestic emissions.
for example compared to an average diet or a "sustainable daily footprint" (Leach et al., 2016), is a way to provide some context and send an implicit message about how the footprint should be interpreted.
• The label should facilitate comparisons between products, but depending on the design it may point more to comparisons within a category (e.g., different brands or types of vegetables) or between categories (e.g., meat versus vegetables).
Although labeling food products with footprints may be seen as a natural next step after providing generic order-of-magnitude information about food categories, it is potentially quite different to label food products since it implicitly tells a more precise story, one where individual products or brands can be reliably benchmarked against each other.Leach et al. ( 2016) made a valuable contribution in discussing design choices, and our view is that these choices make a large difference precisely because they determine what implicit story the label would tell: • We find it deeply problematic if the label is promoted as a "comprehensive environmental impact food label" for the reasons outlined in Section 3.1.3.The nitrogen footprint in its current form is not fit for the purpose of impact assessment.First, there is a long list of difficult scientific questions on the relationship between nitrogen pollution and its many and diverse impacts on the environment and human health; and second, even if those scientific issues can be resolved, there are still difficult value judgements involved when aggregating various impacts into a single numerical indicator.
• In any case, a numerical comparison is fundamentally different from a comparison on an ordinal scale such as a stars label or stoplight label.An ordinal scale tells a different story since (1) it effectively rounds the numerical value to lower precision, and (2) it does not imply a direct equivalence relation since, e.g., stoplight colors cannot be added or subtracted.
• If the footprint is reported relative to a benchmark, the choice of benchmark makes a crucial difference for the story.For example, it sounds relevant to compare to something like a "sustainable daily footprint" (Leach et al., 2016) (and there are other similar proposals, e.g., Fang et al., 2015;Laurent and Owsianiak, 2017) but it would require a great deal of scientific interpretation and value judgements to quantify the word "sustainable".In contrast, an objective benchmark such as the average per-capita footprint radically reduces the embedded value judgements at the expense of potentially reduced environmental relevance.
• Gaining broad acceptance for an environmental food label would likely be difficult.The label would have to be perceived as fair, rigorous, precise enough, and well documented.The necessary level of precision would depend on (1) how much precision the label design signals (e.g., whether it is a numerical or ordinal value) and ( 2) whether the label would be geared towards comparison within product groups (e.g., different brands of the same product) or comparison between product groups (e.g., meat versus vegetables).In either case, we believe that substantial work on clarification and standardization (e.g., with respect to the method differences outlined in Section 3.2 and Table 1) would be needed before such a label could be broadly accepted.

Nitrogen offsetting
An idea explored and demonstrated by Leip et al. (2014a) is to do nitrogen offsetting, comparable to carbon offsetting schemes where companies or individuals help to finance projects that reduce greenhouse gas emissions, so as to compensate for emissions elsewhere.Leip et al. emphasized that nitrogen offsetting is "more difficult to conceptualize and calculate" than greenhouse gas offsetting, but the principle is the same: after avoiding nitrogen pollution as much as possible, the remaining pollution may be offset through pollution savings at another time and place.The demonstration project collected money from participants of a conference to help finance improvement of nitrogen use efficiency in a village cluster in Uganda, so as to reduce nitrogen footprints from its agricultural production in equal amounts as the nitrogen footprint of the conference meals.
According to Leip et al. (2014a), the demonstration was successful in proving that nitrogen offsetting "can be applied to a major scientific conference to raise awareness, reduce the conference's N footprint, and demonstrate that real compensation of Nr releases is possible."However, several challenges have also been highlighted.The difficulty in measuring and comparing different impacts was highlighted by Leip et al. (2014a) themselves, and also by Reis et al. (2016), who noted that "compensating at a distinctively different entity will not remove local or regional effects, unless the spatial resolution of compensation matches the respective environmental effect."Nevertheless, Reis et al. defended the concept since "[t]he major merit of compensation, however, consists of awareness raising to demonstrate how much effort is needed to compensate for a specific adverse human action." In contrast to food labeling, nitrogen offsetting is necessarily based on a calculation directly involving the nitrogen footprint.Thus, it implicitly tells a story of equivalence, a story that one pollution stream really can be compensated by reducing another.Even if researchers know this to be false, or at least vastly more complicated than that, there is a risk that nitrogen offsetting creates confusion in the nontechnical audience that is supposedly the target of the awarenessraising.

Demonstrating the importance of nitrogen embedded in trade flows
Products can be said to embody the pollution caused by their production.Thus, international trade embodies nitrogen pollution, and the nitrogen footprint is one way to quantify this relation between countries.The most comprehensive assessment of nitrogen pollution embodied in international trade so far is the top-down calculation by Oita et al. (2016a), tracing embodied nitrogen in trade between 188 countries using a trade database covering 15,000 economic sectors.The authors explicitly aimed to influence policy by showing that some countries are responsible for substantial nitrogen pollution in other countries, and called for policies with "global coverage and reach".
The main story conveyed by the nitrogen footprint here is that demand in some countries drives pollution in other countries, and that the magnitude of these effects are often quite substantial.The authors explicitly referred to the footprint as a measure of total pollution, not environmental impacts.Hence, we perceive this application as a

Table 2
Examples of approaches to disaggregation of footprints, showing analogies between disaggregation of nitrogen and water footprints.Disaggregating the nitrogen footprint can be seen as a useful step regardless of whether the end goal is an LCA-compatible impact indicator (see Section 3.1.3and Section 3.2).legitimate and useful variation of the order-of-magnitude information described above.

The nitrogen footprint as a research tool
While the nitrogen footprint has mostly been proposed to educate about nitrogen pollution and/or to influence consumer decisions, some have also pointed towards uses within the scientific community.For example, Leip et al. (2014b) wrote that " [d]ifferences in N footprints between countries of similar food products might serve as benchmarking and indication of potential improvement", but they proposed to proceed cautiously "since methodological issues might explain some of the differences, as well as differences in production conditions that are difficult to change, such as climate or soil conditions."In a similar vein, Shibata et al. (2017) aimed to "propose possible options for reducing anthropogenic N pollution to the environment based on N footprint results", but also indicated that the footprint has limitations, not least that it aggregates nitrogen pollution of different chemical forms and in different locations.
In summary, the nitrogen footprint is not primarily proposed as a research tool, and most researchers seem to perceive challenges in using it to understand and mitigate environmental impacts.

Conclusions
The LCA community has reacted strongly to footprint indicators that are in conflict with the principles of LCA and in that sense fail to provide environmentally relevant information (Section 3.1).This paper demonstrates that the nitrogen footprint is open to same type of criticism, since nitrogen pollution causes different types and amounts of damage depending on where, when and in what chemical form the release occurs.Hence, from an LCA perspective the nitrogen footprint somehow needs to be changed so that its units are environmentally equivalent, i.e., to become more like an impact indicator.But this paper also demonstrates-using concrete nitrogen footprint use cases and analogies with other footprints-that there are legitimate reasons to avoid such development of the nitrogen footprint.
The factors speaking against the LCA approach are of both technical and conceptual nature (Fig. 1).On the technical side, it is a formidable scientific task to account for all the ways that nitrogen pollution impacts the environment and human well-being, and therefore moving towards impact assessment would imply more uncertainty, more complexity, and more work.On the conceptual side, we have emphasized that environmental relevance is fundamentally a normative concept: there is no objectively correct way to quantify the array of different impacts associated with the nitrogen cycle.This is often not explicitly acknowledged, but past debates over the water footprint (Section 3.1) demonstrate that normative claims are a rich source of disagreement.
The main reason speaking for a development in the direction of an impact indicator is that the footprint might be used and interpreted as such anyway (Table 3).The story that implicitly follows when the footprint is used to compare different consumption alternatives is that the footprint does guide towards preferred choices.For example, it would not be strange for consumers to think that a product with three-star rating is clearly preferable to one with only two stars (Section 3.3.2),even if the fine print clarified that the rating is a measure of potential pollution rather than potential impacts.Thus, it would also be legitimate to aim for environmental relevance as best as it can be supported by a combination of scientific knowledge and dialogue with relevant stakeholders.
If the aim is to develop the nitrogen footprint into an impact indicator, a first step might be to disaggregate nitrogen losses by chemical form as already demonstrated by some (see Table 1), or perhaps to use some of the spatial and temporal modeling tools that are available but have not yet made it to the nitrogen footprint literature (Section 3.1.3).Such improvements would help researchers to better understand the links between consumption and nitrogen pollution.However, it would be misguided to think that more scientific knowledge and more sophisticated models will remove the inherent difficulty in constructing a meaningful indicator of nitrogen-related environmental impacts.To solve that task, there is a need to explicitly discuss which value judgements would be acceptable to embed in that indicator.Such a discussion should include relevant stakeholders outside academia and should be held with specific reference to proposed applications of the footprint.Otherwise, when stakeholders realize that a purportedly objective footprint is actually an implicit representation of individual researchers' opinions, it risks undermining the credibility of the research community and the nitrogen footprint.
We believe that it would be useful to discuss further what audience and what uses the nitrogen footprint is intended for.Each possible use of the nitrogen footprint tells a certain story.Each story is associated with different implicit claims to environmental relevance.Strictly speaking, it is neither necessary nor possible to reach perfect environmental relevance, so the real challenge is rather to construct a nitrogen footprint that is sufficiently meaningful for its intended uses-a nitrogen footprint that is fit for purpose.Lessons learned from other footprints may be useful to understand whether and how such definitions can be found.The carbon footprint, commonly expressed in units of GWP100 equivalents, is widely accepted although it is not objectively correct and it has been suggested to complement it with other metrics (Section 3.1.2).In contrast, the water footprint is an enduring matter of disagreement due to a combination of technical difficulties in assessing environmental effects and profound disagreement about what is relevant and accessible information for the target audience.
To summarize, this paper proposes some points to keep in mind for further development and use of the nitrogen footprint: (1) environmental relevance is ultimately a subjective concept and therefore it is a matter of persuasion what counts as "good enough"; (2) improving the environmental relevance of the nitrogen footprint is difficult both because nitrogen-related impacts are scientifically hard to quantify and because the footprint covers a whole array of qualitatively different concerns; (3) substantial lessons learned are available from ongoing debates over other footprints; and (4) the right level of environmental relevance depends crucially on the story that the footprint is used to tell.

Table 3
Uses for the nitrogen footprint suggested in the literature, along with our assessment of the environmental relevance they require.Some use cases are more problematic than others since they invite or require a more quantitatively precise interpretation which is not warranted using such a rough proxy of impacts.