Quantifying sustainable intensification of agriculture: The contribution of metrics and modelling

Sustainable intensification (SI) of agriculture is a promising strategy for boosting the capacity of the agricultural sector to meet the growing demands for food and non-food products and services in a sustainable manner. Assessing and quantifying the options for SI remains a challenge due to its multiple dimensions and potential associated trade-offs. We contribute to overcoming this challenge by proposing an approach for the ex-ante evaluation of SI options and trade-offs to facilitate decision making in relation to SI. This approach is based on the utilization of a newly developed SI metrics framework (SIMeF) combined with agricultural systems modelling. We present SIMeF and its operationalization approach with modelling and evaluate the approach’s feasibility by assessing to what extent the SIMeF metrics can be quantified by representative agricultural systems models. SIMeF is based on the integration of academic and policy indicator frameworks, expert opinions, as well as the Sustainable Development Goals. Structured along seven SI domains and consisting of 37 themes, 142 subthemes and 1128 metrics, it offers a holistic, generic, and policy-relevant dashboard for selecting the SI metrics to be quantified for the assessment of SI options in diverse contexts. The use of SIMeF with agricultural systems modelling allows the ex-ante assessment of SI options with respect to their productivity, resource use efficiency, environmental sustainability and, to a large extent, economic sustainability. However, we identify limitations to the use of modelling to represent several SI aspects related to social sustainability, certain ecological functions, the multi-functionality of agriculture, the management of losses and waste, and security and resilience. We suggest advancements in agricultural systems models and greater interdisciplinary and transdisciplinary integration to improve the ability to quantify SI metrics and to assess trade-offs across the various dimensions of SI. * Corresponding author at: Copernicus Institute of Sustainable Development, Utrecht University, Princetonlaan 8a, 3584 CB Utrecht, the Netherlands. E-mail address: mouratiadou@zalf.de (I. Mouratiadou).


Keywords:
Sustainable intensification Indicators Metrics Modelling of agricultural systems Ex-ante scenario assessment Sustainable development goals A B S T R A C T Sustainable intensification (SI) of agriculture is a promising strategy for boosting the capacity of the agricultural sector to meet the growing demands for food and non-food products and services in a sustainable manner. Assessing and quantifying the options for SI remains a challenge due to its multiple dimensions and potential associated trade-offs. We contribute to overcoming this challenge by proposing an approach for the ex-ante evaluation of SI options and trade-offs to facilitate decision making in relation to SI. This approach is based on the utilization of a newly developed SI metrics framework (SIMeF) combined with agricultural systems modelling. We present SIMeF and its operationalization approach with modelling and evaluate the approach's feasibility by assessing to what extent the SIMeF metrics can be quantified by representative agricultural systems models. SIMeF is based on the integration of academic and policy indicator frameworks, expert opinions, as well as the Sustainable Development Goals. Structured along seven SI domains and consisting of 37 themes, 142 subthemes and 1128 metrics, it offers a holistic, generic, and policy-relevant dashboard for selecting the SI metrics to be quantified for the assessment of SI options in diverse contexts. The use of SIMeF with agricultural systems modelling allows the ex-ante assessment of SI options with respect to their productivity, resource use efficiency, environmental sustainability and, to a large extent, economic sustainability. However, we identify limitations to the use of modelling to represent several SI aspects related to social sustainability, certain ecological functions, the multi-functionality of agriculture, the management of losses and waste, and security and resilience. We suggest advancements in agricultural systems models and greater interdisciplinary and transdisciplinary integration to improve the ability to quantify SI metrics and to assess trade-offs across the various dimensions of SI.

Introduction
Sustainable Intensification (SI) of agriculture is being promoted by a growing number of international organizations (FAO, 2017;2011;2004;;UN, 2012;USAID, 2012) and other commissions and focus groups concerned with the sustainability of agricultural production (e.g., Baulcombe et al., 2009;Beddington et al., 2012;Buckwell et al., 2014;Elliott et al., 2013;Foresight, 2011) as a promising strategy to increase the productivity and sustainability of the agricultural sector simultaneously. Despite the broad appeal and growing momentum of the general concept of SI (Weltin et al., 2018;Wezel et al., 2015), the definition and aims of SI are under debate and there is still no consensus on its exact definition and the practices to achieve it (Franks, 2014;Garnett et al., 2013;Garnett and Godfray, 2012;Godfray and Garnett, 2014;Loos et al., 2014;Mahon et al., 2018;Pretty and Bharucha, 2014;Rockström et al., 2017).
Most definitions of SI emphasize i) the achievement of more agricultural output for fewer inputs and land and ii) enhancing the sustainability of agricultural production, with a notable emphasis on environmental sustainability (e.g., Garnett et al., 2013;Pretty et al., 2011;Tilman et al., 2011). The increasing consensus that wider considerations should be encompassed in the definition of SI (De Koeijer et al., 2002;Elliott et al., 2013;FAO, 2004;Garnett et al., 2013;Petersen and Snapp, 2015;Pretty and Bharucha, 2014;Reardon et al., 1999) permits the evolution towards a broader definition. This definition incorporates social and economic aspects, such as social equity and nutrition , rural development (Pašakarnis et al., 2013), the economic viability of agriculture (Ruben and Lee, 2000;The Montpellier Panel, 2013), and the quality of life of society (National Research Council, 2010). Additionally, it reflects the multi-functionality of agricultural systems by considering diverse agricultural production outputs, including food and non-food products (Harvey and Pilgrim, 2011;National Research Council, 2010) and ecosystem services Pretty et al., 2011;Wezel et al., 2015).
Therefore, a holistic view of SI includes agronomic, environmental, social, and economic aspirations (Gliessman, 2014;Struik and Kuyper, 2014) that aim to boost the capacity of the agricultural sector to satisfy the growing demands for diverse products and services, enhance its economic and resource use efficiency, and strengthen social rural structures (Garnett et al., 2013;Godfray and Garnett, 2014). Within this broad definition, the evaluation of different SI options and the identification of potential trade-offs across the various dimensions of SI are necessary. Indeed, numerous studies discuss the technologies and practices of SI (Foley et al., 2011;Matson et al., 1997;Pretty et al., 2011;Pretty, 1997;Weltin et al., 2018), the relevant scales (Gunton et al., 2016;Weltin et al., 2018), and how SI differs from existing agricultural practices and other intensification approaches (Mueller et al., 2012;Petersen and Snapp, 2015;Tittonell, 2014;Wezel et al., 2015). Diverse SI options are proposed ranging from agronomic developments at the farm scale (e.g., adapted cropping) to regional integration actions (e.g., diffusion of innovation), while it is recognized that there is no 'one size fits all' solution (Weltin et al., 2018). The debates demonstrate the complex and diverse effects and potential trade-offs associated with SI and highlight that the success of SI options and their extrapolation in different production systems and locations depends on the specific context. Although the need to evaluate and assess trade-offs across the various dimensions of SI to reach an informed consensus is acknowledged (Struik and Kuyper, 2014), the development and quantification of SI metrics and the identification of successful SI options remain a challenge (Barnes and Thomson, 2014;Firbank et al., 2013;Gadanakis et al., 2015;Garnett and Godfray, 2012;Petersen and Snapp, 2015;Smith et al., 2017;Struik and Kuyper, 2014).
With respect to the development of SI metrics, we argue that there is a need for a framework that is holistic, generic, and policy-relevant at the same time. A holistic SI metrics framework encompassing the different agronomic, environmental, social, and economic dimensions of SI allows the systematic consideration of the effects and trade-offs across these various dimensions. A generic framework compatible with the diverse agricultural contexts, scales, and aims of SI allows consistency, efficiency, and comparability across different cases without compromising the complex case-and scale-specific nature of agro-ecosystems. Finally, a policy-relevant framework provides a unified view on SI that integrates insights from the academic literature, stakeholder perspectives, and the current policy agenda (e.g., the Sustainable Development Goals -SDGs, United Nations, 2015). Thus, it can increase the consistency across the science-policy interface and the compatibility with changing policy goals and key societal challenges.
Two recent studies from Smith et al. (2017) and Mahon et al. (2018) make a valuable contribution to the development of SI metric frameworks by adopting an overall holistic approach. However, the derived metric lists are elicited via either stakeholder consultations (Mahon et al., 2018) or academic publications , resulting in only partial views. Additionally, as concluded by the authors, the suggested metrics in the study of Mahon et al. (2018) place greater emphasis on agricultural production and ecological considerations, as opposed to the social and cultural dimensions of agricultural systems. Finally, both studies lack explicit references to the policy relevance of the metrics contained in the frameworks and information regarding how the proposed metrics could be operationalized and quantified in practice.
For the quantification of SI metrics and the evaluation of SI options, several approaches are available that can be distinguished, based on the considered time horizon, into ex-ante or ex-post assessments. Ex-post assessments based on the use of empirical data (e.g., Barnes and Thomson, 2014;Firbank et al., 2013;Gadanakis et al., 2015) provide interesting insights into the performance of SI options but are limited to situations in which empirical data have been collected and to the analysis of already implemented strategies. These types of assessments cannot analyze SI options that are not currently in place or may perform differently in the future due to external influences (e.g., climate change) or policy changes. Therefore, ex-ante quantitative assessments are promoted as useful tools to improve our understanding of how to achieve SI  and increase the evidence base underlying proposals on future policies before their implementation (Reidsma et al., 2018). Such assessments can enable policy-makers to make better informed decisions (Reidsma et al., 2018) and consequently increase the efficiency and effectiveness of policies (van Ittersum et al., 2008). Agricultural systems modelling, which is based on the use of agroecologic and agro-economic models often used in an integrated manner (e.g., Belhouchette et al., 2011;Mouratiadou et al., 2010;Reidsma et al., 2015;Ruane et al., 2018), is a typical ex-ante assessment approach. However, although many models span scales and disciplines to generate the values of metrics, it is questioned whether key processes are successfully included and represented (Kanter et al., 2018), and challenges continue to exist regarding their use in policy processes (Reidsma et al., 2018). In this context, there is a need to explore to what extent the different dimensions of SI are considered by the currently available agricultural systems models.
Our paper contributes to the development and operationalization of approaches to evaluate SI options in diverse contexts with the aim of facilitating decision making in relation to SI. First, we propose an approach based on the use of a newly developed SI metrics framework (SIMeF) together with agricultural systems modelling for the ex-ante assessment of SI options, driven by the hypothesis that this is a powerful approach to aid SI-related decision making. SIMeF is designed as a holistic, generic, and policy-relevant framework based on the integration of academic and policy indicator frameworks, expert opinions, and the SDGs. Accompanied by guidelines for its operationalization regarding the selection of metrics and their quantification, it provides a comprehensive dashboard to facilitate case-and context-specific selection of SI metrics on agronomic, environmental, social, and economic dimensions of SI. This selection is ideally performed together with stakeholders, and the selected metrics are then quantified by modelling.
Second, to test our hypothesis and understand potential gaps in the SIMeF operationalization approach with modelling, we evaluate to what extent the SIMeF metrics can be quantified by representative agricultural systems models. Therefore, we identify the strengths and weaknesses of modelling to improve the systems understanding and to facilitate decision making via the ex-ante assessment of SI options.

Development of the SIMeF
As a first step for the development of the SIMeF, we conducted a review of the SI literature to obtain an overview of the definitions and aims of SI and specify the scope of the SIMeF accordingly (see details on the review, including the search terms, in Section 1 of Appendix A). In agreement with the increasing consensus in the literature towards a broader definition of SI (see Section 1), we opted for the inclusion of the agronomic, environmental, social, and economic dimensions of SI in the SIMeF.
During the literature review, we found that the hierarchical levels of different indicator frameworks naturally vary, but they usually adopt most of the four hierarchical levels shown below or very similar ones (see Section 2 of Appendix A for details on hierarchical levels of other studies) which we, thus, also adopted in our study to structure the SIMeF: • Domains: Higher level of the hierarchy (e.g., environmental sustainability) relating to the analytical purpose of the SIMeF. • Themes: More concrete agronomic, environmental, economic, and social aspects (e.g., biodiversity, trade, equity). • Sub-themes: Sub-division of the above-mentioned themes into more specific sub-categories (e.g., species richness, self-sufficiency, income distribution). • Metrics: Concrete measurable quantities specifying the often generic sub-themes (e.g., cereal import dependency ratio, GINI coefficient).
Depending on their source, these metrics are more or less concrete.
From the outset, based on our review of the SI literature, we specified seven domains for structuring the SIMeF (see Section 1 of Appendix A for details). Four of these domains relate mainly to its agronomic dimension and the intensification of production (the domains of operating conditions, inputs, outputs, and input-output relationships). The other three domains reflect the environmental, economic, and social sustainability of agriculture (the domains of environmental, economic, and social sustainability). We present these domains in detail in Section 3.1.
Our second step involved reviewing the selected indicator frameworks (Table 1) with the aim of identifying the metrics typically considered in policy and academic arenas and integrating them into the SIMeF. The list of reviewed frameworks was agreed upon by the paper authors due to the prominence and popularity of these frameworks in the science-policy communities and their relevance to the topic of SI (see Table A3 of Section 2 in Appendix A for further details on how the selection of each of these frameworks is justified). We included i) five frameworks that are representative of agricultural policy priorities (policy-relevant), ii) two frameworks with a particular focus on complementing modelling approaches (modelling-related), and iii) five frameworks with a focus on the measurement of SI and the evaluation of SI options (SI-focused). Using the hierarchical structure of domainsthemessub-themesmetrics, we compiled the metrics from all reviewed frameworks into the SIMeF. This step was implemented by mapping the metrics of each framework to the seven SIMeF domains while creating intermediate themes and sub-themes to cluster them into smaller groups (see Section 2 of Appendix A for details).
Third, we organized a scientific workshop on SI metrics to present a preliminary version of the SIMeF. The workshop was held on 22 May 2017 as a pre-conference meeting at the MACSUR Science Conference 2017 (22-24 May 2017, Berlin, Germany). The conference had a strong focus on modelling the agro-environment and was therefore appropriate for the purposes of this work, as it attracted scientists with long experience in agricultural production and sustainability, as well as modelling. The workshop attracted 41 participants who formed an interdisciplinary group with diverse backgrounds (e.g., soils, agricultural economics, climate change) and experience levels (from PhD students to professors) (see also Table A5 in Appendix A for their countries Table 1 Overview of indicator frameworks selected for review for assessing the SI of agricultural systems. The SAFA framework is developed by the SAFA initiative to provide universal guidelines for assessing sustainability in different contexts and working areas.

Policy-relevant
The FSD is a metrics list resulting from a project commissioned by the British Government Office for Science to explore the new science needed to "meet the challenges of producing more food more sustainably".

Modelling-related
-The Goal-Oriented Framework (GOF) (Alkan Olsson et al., 2009) The GOF was developed within the frame of the 'System for Environmental and Agricultural Modelling; Linking European Science and Society' (SEAMLESS) project van Ittersum et al., 2008) focusing on multi-scale modelling approaches. -The Food and Nutrition Security metrics (FNS) (Rutten et al., 2018;Zurek et al., 2018; 2017) The FNS metrics framework was developed as part of the 'Metrics, Models and Foresight for European Sustainable Food and Nutrition Security' (SUSFANS) project, which has modelling approaches at its core.

SI-focused
-Socio-Ecological Systems framework (SES) (Mahon et al., 2018) One of the most comprehensive frameworks with a focus on SI; its SI metrics were collected via semi-structured interviews with 32 stakeholders from throughout the United Kingdom (UK) agrifood system. -Africa-focused SI Indicator List (ASIIL)  Composed via a comprehensive review of SI metrics, this work proposed SI metrics related to African smallholder farming systems at different scales. -The Montpellier Panel's model of SI (MPM) (The Montpellier Panel, 2013) A theoretical model of SI, which highlights its core principles as a new paradigm to tackle food insecurity. -The BioSight decision support tool (BioSight) (Zurek et al., 2015) A shortlist of SI indicators and metrics is proposed.
-Land Use Policy Group SI measurement indicators (LUPG)  A selection of indicators and metrics measured to assess whether different farm types in the UK have achieved SI over a certain period of time. and affiliations). During the workshop, we asked experts to identify metrics that are highly relevant to policy aims related to SI (e.g., food security) and to provide suggestions for improving the SIMeF. We added the identified metrics into the SIMeF by mapping them to the corresponding domain, theme and sub-theme. We invite readers to refer to Section 3 in Appendix A for a detailed description of the workshop setup. Fourth, we integrated into the SIMeF indicators from the SDG indicator framework (United Nations, 2018) 1 to reflect sustainable development priorities. The SDGs represent a universal call to action for achieving sustainable development, constituting an excellent representation of international policy aims. We drew the linkages between the SIMeF and the SDG indicator framework by identifying the SDG indicators that we evaluated as being most relevant to the SI of agriculture (i.e., indicators related to themes already included in the SIMeF). We identified SDG2 "End hunger, achieve food security and improved nutrition and promote sustainable agriculture" as particularly relevant, so all indicators listed under this goal were considered. For all other SDGs, only the indicators directly relevant to the SIMeF domains were considered.
Once the SIMeF was finalized, we identified its most prominent subthemes, i.e., i) those for which there was consensus between the participants of the SI metrics workshop, some of the reviewed frameworks, and the SDGs or ii) those mentioned by at least half of the reviewed frameworks.

Development of the SIMeF operationalization approach
Regarding the operationalization approach of the SIMeF, we adopted the phases of agricultural trade-off analysis suggested by Kanter et al. (2018) and adjusted them to the application of the SIMeF with agricultural systems modelling. The latter is based on the use of agroecological and agro-economic models, often operated in an integrated manner, and is introduced in Section 2.3.

Assessment of quantifiability of SIMeF metrics by agricultural systems modelling
To assess the extent to which SIMeF metrics can be quantified using agricultural systems modelling for the ex-ante assessment of SI options, we used the set of models from the research project 'Assessing options for the SUSTainable intensification of Agriculture for integrated production of food and non-food products at different scales' (SUSTAg). This project includes a large set of representative agro-ecological and agro-economic models operating at various scales (Table 2) and is representative of the model types often used in agricultural systems research (e.g., Humpenöder et al., 2018;Popp et al., 2017;Rodríguez et al., 2019;Ruiz-Ramos et al., 2018).
Agro-ecological simulation models are based on the mathematical representation of biophysical processes and relationships in agroecosystems, typically focusing on the interaction between crops and soils considering the effects of climate and/or management at varying levels of detail. With their process-based nature, they capture the relevant agro-ecosystem dynamics and feedback loops between processes and output indicators. Usually, they explain the impact of climatic variables (e.g., temperature, precipitation, atmospheric CO 2 concentration), crop and soil management (e.g., irrigation, fertilization, tillage, sowing date and density, harvesting, residue management) and genetic characteristics (e.g., plant traits) on a set of outputs, such as agricultural yields, greenhouse gas (GHG) emissions and nutrient losses (e.g., nitrate leaching).
Agro-economic models provide information on economic decisions, land-use patterns, and agricultural production systems and are often based on the assumption of rational utility-or profit-maximizing agents. Both socio-economic and bio-physical data are used as input to these models to create a decision context informed by actual conditions. Biophysical data are often derived by agro-ecologic models. Spatial coverage, represented processes, and definition of the objective function and implemented constraints differ by model and context of the application. While partial equilibrium models solve for the economic market equilibria of supply and demand in the agricultural sector, farm-level models optimize farm management and production decisions considering factors such as the costs, revenues, and resources of the farm. Farm-level models are especially suitable for testing different options available to the farmer, whereas equilibrium models can capture market feedback.
The SUSTAg modelling teams were asked which of the SIMeF subthemes i) are quantifiable as output of their model or ii) can be considered as input into their model (e.g., in the form of scenarios).

SIMeF domains, themes, and sub-themes
The SIMeF was developed through the integration of academic and policy indicator frameworks, expert opinions, and the SDGs to provide a holistic, generic, and policy-relevant dashboard for the selection of SI metrics based on the agronomic, environmental, social, and economic dimensions of SI. It consists of seven domains, 37 themes, 142 subthemes, and 1128 metrics (Fig. 1). The domains consider aspects related to the intensification of production in association mainly with the agronomic dimension of the SIMeF (the domains of operating conditions, inputs, outputs, and input-output relationships) and sustainability aspects related to the environmental, economic, and social sustainability of agriculture (the domains of environmental, economic, and social sustainability). The domains are briefly described below.
1. Operating conditions: Operating conditions set the wider context and have a significant influence on the attainable level of SI. On the supply side, they include indirect enabling factors of SI (e.g., technology), the biophysical environment, the socio-economic and regulatory settings (e.g., markets), and the employed management practices. On the demand side, they refer to preferences for agriculture and the environment, which influence the demand levels and production approaches. 2. Inputs: We consider different inputs of agricultural production, including land, nutrients, water, capital, etc. 3. Outputs: We encompass agricultural output on various food and nonfood products and services, as well as the management of this output regarding losses and waste. 4. Input-output relationships: This domain includes composite metrics of the relationship between inputs and outputs, which denote the productivity, efficiency, and intensity of agricultural production. These are included as either aggregate or individual input outputs and are expressed in physical or economic terms to reflect that intensification can occur via both resource and economic efficiency gains. 5. Environmental sustainability: This domain provides an extensive consideration of environmental sustainability aspects, such as effects on atmosphere, water, biodiversity, soils, etc. 6. Economic sustainability: Here, economic outcomes of agricultural production are considered, such as income and economic returns, security, resilience, and competitiveness. In the following sections, we present the themes and sub-themes per domain (Sections 3.1.1-3.1.7). Additionally, we highlight prominent sub-themes for which there is consensus between the participants of the SI metrics workshop, the reviewed frameworks, and the SDGs or which are mentioned by at least half of the reviewed frameworks. We present the full list of metrics in an interactive manner in Supplementary data 2 of Appendix A and in a spreadsheet format in Supplementary data 3 of Appendix A.

Operating conditions
The operating conditions domain sets the wider natural (biophysical conditions), socio-economic (farmers' training and networks, technological conditions and innovation, condition of markets, preferences and demand, management practices, farm structure) and policy context (governance and regulations) within which agriculture operates (Fig. 2). We identify several prominent sub-themes that establish conditions that encourage SI. Interestingly, these sub-themes point to a multi-actor approach where farmers, policy makers, and the broader society all play roles. Farmers choose the level of technology adoption, their management, and appropriate training. Policy makers influence the infrastructure availability, environmental measures and regulations, and access to input and output markets. The broader civil society establishes consumer preferences and the subsequent demand. Biophysical conditions, although important, appear less prominent, as they are largely given by the context.

Inputs
The inputs domain includes natural (land, water) and manufactured (e.g., fertilization and plant protection, energy) inputs to agricultural production (Fig. 3). Of all the input-related sub-themes, water and energy are the ones mentioned most often in the reviewed frameworks. Input Table 2 Models used for assessing to what extent the SIMeF metrics can be quantified by agricultural systems modelling.

Model
Model type Spatial resolution Spatial extent Temporal resolution Temporal extent Agro-ecological models APSIM (Holzworth et al., 2014) process-based field field to region day input-dependent WOFOST (Boogaard et al., 1998) process-based field field to region day input-dependent CATIMO (Bonesmo and Bélanger, 2002) process-based field field to region day input-dependent MONICA (Nendel et al., 2011) process-based field field to region day input-dependent DSSAT (Jones et al., 2003) process-based field field to region day input-dependent AQUACROP (Steduto et al., 2009) process-based field field to region day input-dependent SIMPLACE (Gaiser et al., 2013) process-based 0.25 • Europe day input-dependent LPJmL (Schaphoff et al., 2018) process-based 0.5 • globe day input-dependent Agro-economic models DEMCROP (Purola et al., 2018) farm level farm input-dependent year input-dependent (30 years) Sust-FARM (Ruiz-Ramos et al., 2020) farm level farm input-dependent year input-dependent DREMFIA (Lehtonen and Rankinen, 2015) partial equilibrium NUTS II (modified) national ( Table 1 and the characteristics of the models are outlined in Table 2. The sub-themes highlighted in bold are those identified as most prominent, i.e., for which relevant metrics i) are mentioned in consensus in the SI metrics workshop, the SDGs, and some of the reviewed frameworks or ii) are listed by at least half of the reviewed frameworks. metrics alone are mentioned very little by the workshop participants and the SDGs. This situation reflects an implicit recognition that assessing input management alone is less revealing than in relation to the achieved output and the associated pedoclimatic environment. Thus, composite metrics, such as those in the domain of input-output relationships, are more prominent both in the SDGs and the workshop and are more appropriate in the context of SI than those of the domain inputs.

Outputs
The outputs domain involves the production of primary (e.g., main products or by-products) and secondary products (e.g., food such as protein supply or energy such as renewable energy production), as well as losses onfarm and post-harvest losses and waste downstream in the food value chain (Fig. 4). Post-harvest losses stand out as the only sub-theme that was mentioned in consensus by the workshop participants, the SDGs, and one of the indicator frameworks, while the sub-theme losses on farm is well noted within the reviewed themes. The reduction of losses and waste, in addition to increasing primary production, can be a successful SI approach.

Input-output relationships
The sub-themes in this domain can be broadly grouped into the themes of i) productivity, typically showing how much output is produced per unit of input, including yields, which specifically reflect land productivity; ii) intensity, generally showing the concentration of inputs per unit of land (except emissions and energy intensity, which are indexed to produced output as opposed to land units); and iii) efficiency, indicating the relationship of uptake/utilization versus input of a production input (e.g., nutrients or water) (Fig. 5). Sub-themes related to productivity (nutrient and material productivity, labor productivity) as well as energy and emissions intensity, which all associate inputs to outputs of production, feature prominently when attempting to obtain consensus amongst the workshop, the SDGs and the reviewed indicator frameworks. Nutrient and water use efficiencies, which can be improved by enhancing the management of inorganic fertilizer and/or water, are also well noted. Most sub-themes in this domain have been popular with workshop participants, denoting their relevance in the context of SI.

Environmental sustainability
The domain of environmental sustainability considers atmosphere, water, biodiversity, soils, land use, and other resources (e.g., fossils) (Fig. 6). In this domain, we identify prominent sub-themes from all its themes, which highlights the diversity of interactions of agricultural production with its surrounding ecosystems. We identify water quantity and quality, agricultural genetic resources, species richness, soil degradation, and land use as the sub-themes that were mentioned in consensus by the SI metrics workshop, the SDG indicator framework and several of the reviewed frameworks. GHG emissions are mentioned by most indicator frameworks, emphasizing climate change as one of the most common environmental concerns. It is worth noting that the SDGs do not include an indicator on the absolute level of emissions but include composite, indicators such as 'CO 2 emissions per unit of value added' (in SIMeF under input-output relationships), linking environmental sustainability to socio-economic development. Soil organic matter and soil erosion also appear prominently within the reviewed frameworks, highlighting the importance of the long-term sustainability of agricultural soils.

Economic sustainability
The economic sustainability domain includes the themes of income and growth, security and resilience, and competitiveness and trade, as well as capital and prices (Fig. 7). The economic sustainability sub-themes put forward in consensus by the workshop, the SDGs, and the reviewed indicator frameworks relate both to macro-economic (economic growth, agricultural output prices, trade performance and regulation) and microeconomic (farm income) performance. Poverty and resilience to disaster, which are closely linked to SDG1 (no poverty), are also identified as important. Notably, the SDG2 (zero hunger) indicators feature more prominently in this domain, highlighting the importance of economic sustainability for farmers and the agricultural sector overall towards eliminating hunger.

Social sustainability
The social sustainability domain focuses on the themes of equity, population dynamics, human capital, quality of life, and food and nutrition (Fig. 8). Several sub-themes on equity (income distribution, gender equity), human capital (employment, social capital), and food and nutrition (food security, undernourishment, food safety) are identified as prominent in consensus between the workshop participants, the SDGs, and the reviewed frameworks. This domain features the highest number of relevant SDGs in relation to all its themes (eight out of seventeen), highlighting the cross-cutting relevance of social conditions for sustainable development.

SIMeF operationalization approach
The SIMeF is designed to help identify the synergistic or conflicting aims and effects of SI in diverse agricultural contexts. The importance of these SI aspects depends on the particular context, stakeholders' goals, and ongoing pressures on the system. Therefore, we recommend using the SIMeF as a comprehensive dashboard of metrics from which the most relevant metrics in a given situation can be selected, ideally together with stakeholders. These metrics are then quantified with agricultural   systems modelling to evaluate the SI options and potential trade-offs before their implementation. As suggested by Kanter et al. (2018), agricultural trade-off analysis consists of four phases (see Fig. 9), which we endorse here to propose an operationalization approach for the SIMeF.
First, the decision setting is characterized, and a minimum set of sufficient context-specific metrics are identified ideally together with stakeholders. Discerning which metrics are most important in a given situation depends on the SI goals and the stakeholder requirements in their specific context. Dale et al. (2019) present a useful approach for selecting sustainability indicators together with stakeholders and emphasize that a key challenge is to balance i) the desire for comprehensiveness, ii) a process workable within time and budget constraints, and iii) diverse interests. On the one hand, it is recommended to avoid selecting too many metrics, as this creates redundancies and increases costs in terms of time and money and complexities in the interpretation (Rasmussen et al., 2017;Van Cauwenbergh et al., 2007). On the other hand, selecting too few metrics may increase the risk of omitting  significant effects or important new trade-offs and developments and lead to oversimplification (Bossel, 2002;Landres et al., 1988).
To facilitate the choice of metrics from the SIMeF according to the principles outlined above, we recommend the following: a) selecting metrics from as many SIMeF domains as possible to maintain a holistic approach and decrease the risk of omitting any likely trade-offs that may emerge between different dimensions of SI, b) using the hierarchical structure of the SIMeF to gradually select, first, the priority themes and then the sub-themes and metrics in a given situation, thus avoiding redundancy by selecting too many metrics from a given theme, and c) considering if the sub-themes, which are identified as prominent in Figs. 2-8 (mentioned in consensus between several literature sources and/or the workshop participants and the SDGs) and, thus, are highly likely to be important in diverse situations are also relevant in the specific context that is being analyzed. The interactive presentation of the SIMeF (Supplementary data 2 of Appendix A) largely facilitates this process by allowing navigation of its long list of metrics and involvement of stakeholders. For a more elaborate understanding of the selected metrics, we direct the reader to their original sources, shown per metric in Supplementary data 2 and 3 of Appendix A.
Second, the appropriate method is selected for quantifying the SI metrics across different SI options. In this paper, we emphasize the use of the SIMeF with agricultural systems modelling for ex-ante assessments. This involves the phases of scenario formulation and simulation or optimization. The scenarios can describe the SI options and how their performance may vary according to the boundary operating conditions within which agricultural systems operate. These boundary conditions may include dimensions such as socio-economic narratives, climate scenarios, water availability, etc. Many of these conditions are reflected in the SIMeF domain operating conditions. Scenarios are implemented in models that, via simulation or optimization, aim at reproducing and projecting likely effects over time. Once the context-specific metrics have been selected as part of the first step of the operationalization approach, one can evaluate the extent to which these metrics can likely be quantified by agricultural systems models (see Section 3.3). If the selected metrics cannot be quantified by means of modelling, SIMeF could also be combined with other methods of ex-ante or ex-post analysis, such as statistical data, measurements from experiments, qualitative scenario assessments, and expert evaluations.
Third, the results of the analysis are presented and used for decision making. Deciding on the means of communicating the effects of SI options and the potential trade-offs to stakeholders and decision makers is part of this step. Given the differences in conceptualization, units, and the extent to which different metrics are quantifiable, we do not suggest combining and presenting SIMeF metrics into a single SI indicator considered for decision making. Rather, using a selection of metrics that allows for flexible weighting of different aspects by stakeholders and decision makers through, e.g., multi-criteria analysis facilitates taking into account the preferences and value systems of different stakeholder groups in the process of decision making. Presentation formats such as frontier or trade-off curves and spider diagrams are useful formats for the presentation of the effects across various dimensions of SI over time and/or across diverse scenarios. The fourth step of improving the uptake of the outputs of the analysis by decision makers is the most challenging one. Irrespective of the methodological approaches employed, the final decision-making challenges of i) identifying SI strategies with overall positive impacts on the environment, the economy and society and ii) reaching consensus from the potentially diverse subjective value judgements of different stakeholders require a multi-actor approach and early engagement of the appropriate stakeholders in the decision-making process. In this process, as long as the individual indicators remain accessible, the impacts and value judgements remain transparent. The holistic structure of the SIMeF and its interactive presentation encourage this.

Agro-ecological models
The agro-ecological models considered in this study are particularly powerful in quantifying metrics of i) outputs and, specifically, primary production (main products, other primary products) (Fig. 4) and ii) input-output relationships of an eco-physiological nature, such as nutrient and water productivities and efficiencies and yields (product yield, yield variability, yield gap) (Fig. 5). Many of these models can also be used to quantify environmental sustainability metrics associated with water quantity (crop and soil water use and storage), nutrient losses, GHG and ammonia emissions, carbon sequestration, and soil organic carbon (Fig. 6).
Many more metrics can be considered via a scenario approach. In this case, the exogenously defined scenario-specific parameters associated with many of the SIMeF sub-themes are specified, and the implications Fig. 9. SIMeF operationalization approach for analysis and decision support. The four phases of operationalization are adopted from Kanter et al. (2018) and adjusted to the application of the SIMeF with agricultural systems modelling. of these assumptions on the model outputs are assessed. Most models reviewed here allow the evaluation of the effects of different scenarios of i) operating conditions, with a prime capability of assessing the effects of biophysical conditions (climate, hydrological characteristics), a wide range of management practices (e.g., water management), and breeding innovation, (Fig. 2) ii) inputs (irrigated land, water volume, fertilizers) (Fig. 3), and iii) some input-output relationships (chemical input intensity, cropping density) (Fig. 5).
A consideration of the economic (Fig. 7) and social dimensions (Fig. 8) of sustainability is outside the scope of agro-ecological models.

Agro-economic models
Agro-economic models have the capability to endogenously estimate metrics associated with most SIMeF domains. Most of the reviewed models estimate the selected i) inputs (use of mineral fertilizers, pesticides, labor, animal feed) (Fig. 3), ii) outputs (primary production, food) (Fig. 4), iii) input-output relationships (yields, chemical input intensity) (Fig. 5), and metrics of iv) environmental sustainability (e.g., GHG and ammonia emissions, nutrient losses, agricultural land use) (Fig. 6), v) economic sustainability at the farm (farm income, farm profitability and costs) and market (self-sufficiency, trade performance, output prices) levels (Fig. 7), and vi) social sustainability in relation to food and nutrition (food availability, selfsufficiency, consumption) (Fig. 8).
Similar to agro-ecological models, agro-economic models consider many of the SIMeF sub-themes in the form of scenarios. These are mainly those under the operating conditions domain, which estimate the effects of biophysical factors (land suitability, climate), farm structure (size, specialization), policy (environmental measures and regulations), and/or demand patterns (consumer preferences, population size) (Fig. 2). Management practices are also considered, often in interaction with agroecological models, although the set of options is more simplified or limited compared to the latter type of models. Inputs are often considered in the form of constraints and activity requirements (e.g., land or energy). The effects of several economic drivers from the economic sustainability (input prices, subsidies) or input-output relationships (labor productivity) domains can also be assessed.
The metrics related to the market-level interactions of supply and demand, such as preferences and demand, self-sufficiency, and trade performance, as well as food availability and consumption, are only considered by partial equilibrium sectoral models, such as DREMFIA, CAPRI, and/or MAgPIE.

Identified gaps in the quantification of SI metrics by models
Despite the strong capabilities of agricultural system models to quantify many aspects related to the SI of agricultural production, significant gaps remain. The key ones are outlined below.
• Non-economic farmers' behavior: The aspects of the behavioral dimensions of farmers' decision making beyond the optimization of economic performance are often not explicitly considered by the reviewed agricultural system models. These behavioral aspects can relate to the farmer's implicit knowledge (e.g., farmers' training and networks), tendency to innovate or avoid risk and change (e.g., products innovation, implications of farm age structure), business governance (e.g., due diligence, holistic audits), and underlying farmers' attitude towards farming, agriculture, and nature. • Societal demands and governance of agriculture and the environment: The effect of wider societal demands and governance structures, e.g., coherence of policy at different levels or the aspirational preferences of the public regarding agriculture and the environment, are less tangible conditions that are difficult to consider via quantitative modelling. • Social sustainability: With the exception of some indicators related to food and nutrition, all other aspects of social sustainability, including equity (e.g., income distribution, gender equity, discrimination), human capital (employment, labor conditions), quality of life (e.g., coverage of basic needs, animal welfare), and population dynamics (e.g., urban--rural population, migration), are not considered by any of the reviewed models. • Establishment of appropriate market conditions: Market conditions (infrastructure availability, access to input, output and capital markets, land ownership), which relate to more complex institutional and logistical arrangements, are often not explicitly considered. • Provision of non-food products and ecosystem services: Consideration of outputs other than food is limited, and the less traditional products (energy, materials) and services (tourism, other ecosystem services) of agriculture are represented in none or only a few of the models (e.g., energy in MAgPIE and CAPRI). • Management of losses and waste: Most models are concerned with produced output, but only a few agro-economic models (e.g., MAgPIE and CAPRI via scenarios) and none of the agro-ecological models consider which part of this output is lost post-harvest (e.g., during transport or storage) or wasted along the food value chain.
The on farm losses associated with disease regulation also seem to be captured less often. • Security and resilience: Resilience is a multifaceted and case-specific concept, and several aspects associated with resilience (e.g., yield stability, income variability) can be considered and/or assessed by the reviewed models; however, aspects such as poverty levels, resilience to disaster (e.g., costs of recovery) or income diversification are not considered by the participating models. • Biodiversity: Biodiversity metrics with respect to flora, fauna, and agricultural genetic resources are quantified by only a limited set of models (e.g., DREMFIA considers species richness). • Soil health: Even though agro-ecological models are good at modelling carbon, nitrogen, and, to some extent, phosphorus plantsoil interactions, several aspects relating to long-term soil health and fertility (e.g., soil erosion, compaction, contamination) are less well represented.
Further efforts on model development and collaboration with other research areas are of particular relevance for quantifying SI metrics related to the above aspects, as further discussed in Section 5.2.

Discussion
Our study proposes that a holistic, generic, and policy-relevant SI metrics framework, such as SIMeF, together with agricultural systems modelling for the ex-ante assessment of SI options is a powerful approach to aid SI-related decision making. Following a detailed presentation of the SIMeF and an evaluation of the extent to which its metrics can be quantified by representative agricultural systems models, here, we critically evaluate the SIMeF and, based on the identified gaps in the quantification of SI metrics by models, we reflect on potential ways forward to address these gaps.

Potential and limitations of the SIMeF for analysis and decision support
The SIMeF consolidates and combines expert knowledge, the SDGs, and several other existing frameworks into a list of metrics structured along a nested arrangement of domains, themes, and sub-themes pertaining to SI. By addressing a broad abundance of themes, the SIMeF offers a comprehensive overview of the multiple aspects and multifaceted nature of SI of agriculture and its links to sustainable development policy priorities. Its holistic approach facilitated by its hierarchical structure permits identifying, assessing, and addressing the potential synergies and trade-offs emerging in the context of SI. Furthermore, it allows for applications to very diverse SI options at different scales and levels throughout the agricultural value chain in a generic and flexible way. Therefore, the holistic and flexible SIMeF lends itself to transparent assessments of SI options in any decision-making context. We note that the SIMeF consolidates other existing frameworks. Hence, naturally, the level of detail, scope, and scales differ between its individual metrics. Some metrics are generic, while others are more specific or well defined. Some metrics may be applicable at the field scale, while others may be applicable at the regional or sectoral levels. Furthermore, the list of metrics is not always exhaustive with respect to their respective theme and domain, and many more detailed metrics exist (e.g., Pinstrup-Andersen, 2009;Schiefer et al., 2015;UN ESCAP, 2009). In particular, with respect to biodiversity, we note that increasing societal concerns and recent policies (e.g., the Biodiversity Strategy for 2030 in Europe; European Commission, 2020) create the need for wider and potentially more complex sets of biodiversity indicators (see, for example, Feest et al., 2010) that can complement those that are currently included in the SIMeF.
Unsurprisingly, the trade-offs between comprehensiveness, flexibility, consistency, and appropriateness for specific analyses and decision-making contexts, which are inherent in the design of generic frameworks, are also apparent in the case of the SIMeF. We made an effort to strike the appropriate balance but would welcome additions and further improvements to the SIMeF as further research. Disentangling the effects and specifying the metrics and the extent to which they can be quantified at different scales would be particularly interesting. Finally, the use of the SIMeF for decision making is constrained by the availability of information on the different metrics. As exemplified from the SUSTAg model portfolio, not all aspects can be quantified in all cases (see Section 3.3.3), and this can be amplified in world regions with limited data availability. We argue that the SIMeF can still serve as a useful dashboard for identifying the aspects that are of relevance in specific contexts, while further efforts to fill data and knowledge gaps need to be addressed. The following section provides some suggestions.

Filling the gaps in the quantification of SI indicators
Section 3.3.3 identified gaps in the quantification of SI metrics by agro-ecological and agro-economic systems models. Some of these gaps can be filled by further developments in agricultural systems modelling. For example, modelling the demand and supply for non-food products and services of agricultural production, such as bioenergy (e.g., Mouratiadou et al., 2020) or ecosystem services (e.g., DAKIS project; Feld, 2018), are ongoing developments. The management of losses and waste can be taken into account in the form of scenarios regarding consumer behavior and/or management in the context of the circular economy (e. g., Oldfield et al., 2016). Losses on farms due to, for example, disease prevalence or extreme events can be represented via probability distributions (Webber et al., 2018a;2018b). Enhancing modelling approaches with explicit spatial and landscape-specific dimensions can permit the consideration of logistical effects (e.g., market proximity and management of losses) or the role of land use on biodiversity (e.g., species migration, pollination services, pest and disease impacts) (Nendel and Zander, 2019). Finally, the soil components of agro-ecological models can be improved to consider further physical, biological, and chemical effects of agricultural production on soils, including in the landscape or watershed context (Nendel and Zander, 2019).
The consideration of some other, mainly economic and social, aspects requires further interdisciplinary and transdisciplinary collaboration and cross-fertilization with other methodological approaches to be more firmly embedded in the evaluation of SI options. To identify preferences, analyze institutional and governance structures, and formulate narratives on behavioral aspects and innovation, stakeholderbased analyses conducted in collaboration with social scientists (e.g., Sewell et al., 2014) or via semi-quantitative assessments (e.g., Mouratiadou and Moran, 2007;Quinn et al., 2013) can be insightful. Semiquantitative assessments and expert consultations can also be complementary in the case of limited data availability but such approaches necessitate careful setup in order to minimize stakeholder biases affecting the results. Other modelling approaches, such as agent-based modelling, can capture aspects such as cooperation, competition, or bounded rationality of individual decision making and resilience (e.g., Guillem et al., 2015;Rasch et al., 2017;. Rule-based models (e.g., Hutchings et al., 2012;Minoli et al., 2019;Waha et al., 2012) or classifications of representative agents integrated into models (e.g., McCollum et al., 2017) can play a role in the representation of behavioral patterns. Heuristics, indicator development, analysis of governance structures and modelling are proposed as effective methods for resilience-based analysis (Ge et al., 2016). Impacts on human health and animal welfare can be informed via medical and epidemiological research. Some examples where meta-relationships on health effects are integrated into modelling exist (Springmann et al., 2016;Vrontisi et al., 2016). Input-output or general equilibrium modelling can provide insights on employment or distributional aspects (e.g., Baldos et al., 2019). The recognition of the requirements for further interdisciplinary and transdisciplinary integration is not new, yet more endeavors are needed in this direction. This study is helpful in guiding these endeavors by providing a clear description of the gaps.
We acknowledge that the evaluation of the extent to which SIMeF metrics can be quantified was performed using a finite model sample, and the results may be different with a larger sample and a greater diversity in the types of models considered. However, our conclusions are in agreement with other recent studies evaluating model capabilities to assess sustainability. For example, van Soest et al. (2019) highlight that aspects of human development, good governance, and heterogeneity are not well represented in integrated assessment models and call for cooperation in multi-model frameworks and other disciplines. Kanter et al. (2018) note that gaps in terms of what models can estimate are present regarding human well-being (e.g., gender equity and empowerment) and resilience indicators, and they propose improved scientiststakeholder engagement in the research process.
Our analysis focuses on identifying which sub-themes are quantifiable or considered by agricultural systems models, but it does not assess the extent to which individual metrics are quantifiable nor does it discuss the confidence in the model quantifications. Although this step is outside the scope of this paper, we note that the underlying assumptions adopted by individual models differ, as does the exact definition of the metrics that they can quantify per sub-theme. Furthermore, the uncertainty propagation associated with different model assumptions largely affects the outcome of ex-ante assessments and is an area where further progress is needed . A significant focus on multimodel intercomparison exercises in the last decade highlights efforts to improve models and strengthen confidence in their projections (Martre et al., 2015;Müller et al., 2017;Rodríguez et al., 2019;Rötter et al., 2011;Wallach et al., 2016).

Conclusions
Agricultural production is embedded in complex interacting socioeconomic and natural systems. As a result, the success of SI policies and practices is context-and case-specific and can change over time.
Since the multi-faceted nature of SI makes different effects of SI options possible, value-based case-specific decision making is necessary. This situation requires approaches that can provide comprehensive and systematic information on the likely agronomic, environmental, economic, and social effects of SI options in diverse contexts.
To this aim, in this paper, we present the SI metrics framework SIMeF, which proposes a holistic, generic, and policy-relevant approach for quantifying and assessing SI options over thematic areas and model types. The proposed structure is simple and transparent yet follows a systems approach to capture the complexity of agricultural systems. New metrics can be easily added, and different metrics can be flexibly selected, if appropriate, in collaboration with stakeholders, and aggregated in combination with decision-making tools, such as multi-criteria analysis. Although not exhaustive, by consolidating numerous indicator frameworks and expert opinions, the SIMeF provides an extensive and unified overview of the themes and metrics associated with SI. Furthermore, it highlights their policy relevance with respect to sustainable development priorities, as represented by the SDGs. The SIMeF is operational at different scales. In this paper, we propose its operationalization with agricultural systems modelling to facilitate ex-ante assessments and inform decision making regarding future SI options. However, we note that the SIMeF can be combined with other quantitative and qualitative assessment tools for ex-ante and ex-post analyses.
To reflect on the operationalization of the SIMeF with modelling, we evaluate to what extent the SIMeF sub-themes can be quantified by different types of agricultural systems models. We find that the integration of agro-ecological and agro-economic models allows a unified systems approach for the quantification of productivity and resource use efficiency, as well as environmental, economic, and, to a lesser extent, social sustainability under diverse operating conditions. However, important gaps remain in the model representation of both socioeconomic and natural environments, as represented by the given sample of models. Regarding socio-economic aspects, these gaps include i) behavioral aspects that go beyond the optimization of economic performance, such as cooperation or uptake of innovation, ii) societal demands and governance of agriculture and the environment, iii) social sustainability aspects related to equity, human capital, quality of life, and animal welfare, and iv) the establishment of appropriate market conditions. The representation of multi-functional agricultural systems that provide diverse goods and services, the management of losses and waste, security and resilience, and biodiversity are also insufficiently captured.
Better consideration of these aspects requires greater interdisciplinary integration with social and health scientists, transdisciplinary approaches through stakeholder and expert consultations, and advances in the capacity of agricultural systems models to represent more complex processes. Embracing a systems approach to enable crossfertilization with other types of qualitative or quantitative analysis can facilitate this advancement.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements
This work was conducted in the context of the SUSTAg project, funded in the frame of the ERA-NET FACCE SURPLUS, which has received funding from the European Union's Horizon 2020 research and innovation programme [grant number 652615] -4330]. We would like to express our gratitude to the participants of the scientific workshop on SI metrics held in Berlin on 22 nd May 2017. In particular, we acknowledge the valuable contributions of the following presenters and moderators of the scientific workshop on SI metrics: A. Beblek, J. Heinke, K. Helming, A. Meyer-Aurich, K. Schmidt, F. Sinabell and Gerrie van de Ven. We thank Ton Markus from the Cartography Department of the Faculty of Geosciences of Utrecht University for support in producing the figures and Meinou de Vries from Studio Infograph for the interactive visualization of the SIMeF presented in Supplementary data 2 of Appendix A. We thank the anonymous reviewers of the manuscript for their constructive comments.