New Generation of Agricultural System Models, Data, and Knowledge Products

This paper presents ideas for a new generation of agricultural system models and data that could meet the needs of a growing community of end-users exemplified by a set of Use Cases. We envision new models and knowledge products that could accelerate the innovation process that is needed to achieve the goal of achieving sustainable local, regional and global food security. We identify desirable features for models, and describe some of the potential advances that we envisage for model components and their integration. We also discuss possible advances in model evaluation and strategies for model improvement, an important part of achieving our vision.


INTRODUCTION
The idea of creating a new generation of agricultural system models and knowledge products is motived by the convergence of several powerful forces. First, there is an emerging consensus that a sustainable and more productive agriculture is needed that can meet the local, regional and global food security challenges of the 21 st Century. This consensus implies there would be value in new and improved tools that can be used to assess the sustainability of current and prospective systems, design more sustainable systems, and man-age systems sustainably. These distinct but interrelated challenges in turn create a demand for advances in analytical capabilities and data. Second, as discussed in the companion paper on The State of Agricultural System Science, we now have a large and growing foundation of knowledge about the processes driving agricultural systems. Third, rapid advances in data acquisition and management, modeling, computation power, and information technology provide the opportunity to harness this knowledge in new and powerful ways to achieve more productive and sustainable agricultural systems, as discussed in the companion paper on Building an Open, Web-Based Approach.
Our vision for the new generation of agricultural systems models is to accelerate progress towards the goal of meeting global food security challenges sustainably. In this paper and the companion paper on information technology and data systems, we employ the Use Cases presented in the Introductory paper, and our collective experiences with agricultural systems, data, and modeling, to describe the features that we think the new generation of models, data and knowledge products need to fulfill this vision.

Use Cases: Implications for Next Generation Models
We now discuss the implications of the five Use Cases for the development of second generation models and knowledge products. Table 1 summarizes their characteristics.
Cropping and farming systems models and data are needed to produce the results for the smart phone application, and thus help Sizani deliver farm-specific ad-vice to increase maize productivity and stability and to increase the economic and nutrition well-being of the farm family. The cropping system models are needed to simulate maize, beans, and vegetables that are produced by the farmer. In addition, the models need to take into account the benefits of using new varieties of maize and beans that are tolerant to high temperature and drought, since these are projected to increase under changing climate conditions. Furthermore, the crop models need to be able to simulate the effects of small increases in inorganic fertilizer as well as organic matter, and to simulate the effects of partially harvesting rainfall. A crop disease module is needed to simulate the effects of foliar diseases for susceptible and tolerant varieties.

Developing and Evaluating Improved Crop and Livestock Systems for Sustainable Intensification
Xiaoming is a plant breeder/geneticist working on developing a drought-and heattolerant hybrid of maize. Profit, risk management, sustainability objectives A maize cropping system model is needed that has the capability to predict the benefits of the new drought and heat-tolerant maize varieties under the range of soils, weather, and management conditions across the regions of interest in Africa. Furthermore, a household economic model is needed to evaluate the adoption of the new maize varieties, resources (e.g., access to credit, labor, and fertilizer inputs) needed to produce the new variety. One question would be about the costs of purchasing the new variety, as well as the benefits and risks of growing it relative to traditional varieties. Therefore, information is needed on the household resources and constraints as well as information on the yield gains expected by switching to the new variety and the overall impacts on the economic livelihood of the farm family. Costs of inputs and likely prices of grain are needed for the economic model. Also, soil, weather, and management information are needed as inputs to run the crop and house-hold models to evaluate the switch to the new variety.
The model-based analysis needs to take into account the risks associated with weather variability in the short term as well as responses to changes in climate that are projected for the longer term. Assuming that the farmer grows other crops for food/fodder and for sale and has livestock, models for these other enterprises are also needed. Ideally, the crop and household economic model would be used to perform simulation experiments, similar to how a randomized controlled trial might be performed if that were possible. Results from these simulation or optimization experiments would allow Debora to evaluate multiple factors, such as variability in maize grain and fodder yield, income, return on investment, and nutrition.

Investment in Agricultural Development to Support Sustainable Intensification
Sampling sets of regional parameters that can be representative of the landscape as a whole is necessary before implementing crop or livestock production models. The analyst is faced with balancing the accuracy of representation of the landscape against the proliferation of model runs and their associated expenses. This fi step in project design requires careful summaries of the range of soils, altitudes, microclimates, and water resources systems in the whole area.
Since animal production is an integral part of the farming system, the livestock model should be integrated with the smallholder crop models. Ideally both livestock and crop models can be run simultaneously thus showing the nutrient flows between different production sectors and the sustainability of the system as a whole. The type of cropping system model used, will have year-after year carry-over of soil carbon, soil fertility, residue return, and use of both animal manures and inorganic fertilizer.

Management Support for Precision Agriculture in the US for Profitability, Soil Conservation and Water Quality Protection
Process-oriented crop growth models simulate the effects of genetics, management, weather and stresses on the daily growth of crops using carbon, nitrogen and water balance principles. The strength of these models is their ability to account for stress by simulating the temporal interaction of stress on plant growth each day during the season. Thus, they tend to be sensitive to temporal patterns of stress. However, these models were designed for homogeneous areas, and as a result, inputs that are spatial in nature must be assumed to be uniform. Furthermore, spatial characteristics are often unknown or difficult and expensive to measure. The advent of Precision Agriculture has resulted in the need to extend the use of point-based crop models to account for spatial processes. Crop models can provide useful estimates of potential economic return for management recommendations, along with the sensitivity of a recommended management action in response to weather variability. The next generation of crop models for Precision Agriculture will account for spatially connected processes and use publicly available data on soil type, weather forecasts, along with location specific data from farmers" yield maps, to pro-vide a prescriptive crop management plan on a very high spatial resolution.

Supplying Food Products that Meet Corporate Sustainability Goals
The system models needed to support supply chains in their pledge for sustainability are the same system models described in the precision agriculture user case. Crop system models are able to simulate the annual fluxes of N 2 O from soils under different pedoclimatic and management conditions rather well, but their performance requires improvements when simulating the daily fluxes of N 2 O. As N 2 O is directly linked to the amount of fertilizer used, the next-gen models will play a crucial role in identifying the optimal N rate that maximizes profits and reduces nitrous oxide emissions and nitrate leaching. Table 2 summarizes a number of agricultural system model features that are suggested by the Use Cases. These have important implications for the design of newgeneration models and knowledge products.

Implications for Second Generation Models and Data
All of the small-holder use cases (1-3) require whole-farm models, and decisionmakers in the commercial crop use cases (4 and 5) are likely towant whole-farm information as well, even if the specific use case (e.g., precision nitrogen application) does not require it.  All cases need spatially referenced data, but the type and resolution of data required varies across the Use Cases.  All of the Use Cases need biophysical production outputs and economic outputs. The need for environmental and social outputs is case-specific Designing Next Generation Models Given the gap between the current state of agricultural systems models and the needs of actual and potential users, this section discusses how the new generation of models can be created to bridge this gap and realize the vision for next generation models presented in the Introduction.
Demand-Driven, Forward-Looking Approach A first step towards realizing the potential for agricultural systems models is to recognize that until now, most model development has been motivated by re-search and academic considerations, not by user needs. This means that the model development com-munity needs to turn the model development process "on its head" by starting with outcomes and working back to the models and data needed to quantify relevant model outputs. For example, the Use Cases show that in most cases whole-farm models are needed, and particularly for small-holder farms, models are needed that take into account interactions among multiple crops and often livestock. Yet, many agricultural systems models represent only single crops and have limited capability to simulate inter-cropping or crop-livestock interactions.

A Systems Approach
The Use Cases show clearly the need for whole-farm systems approaches. Agricultural systems are man-aged ecosystems (or agroecosystems) comprised of biological, physical and human components operating at various scales (e.g., cell, organism, field, farm). Farms are embedded within larger ecological and human systems operating at regional scales (e.g., watershed, population), as well as larger (continental, national, global) scales. It is typically important to consider many different interactions within and among these systems if we are to meet stakeholder needs for actionable outcomes.
The systems approach has several important implications for second generation models. Within each sys-tem level, a set of interacting sub-systems is involved. This suggests the possibility of constructing models of large, complex systems by combining models of modular sub-systems. The level at which modularization may be possible remains an important question, and this in turn has implications for software engineering. For example, as discussed in the companion State of Science paper, many crops are now modeled individually and separate from livestock. Figure 1 presents a diagram of the linkages between the "pre-competitive space" of basic science and model development, and the "competitive space" of knowledge product development. The arrows be-tween these two "spaces" point both ways to repre-sent the inevitable and important give-and-take. There is a need for a demand-driven but forward-looking process that enhances interactions between these two realms. The concept of "pre-competitive space" grew out of the efforts of the pharmaceutical industry to collaborate on basic research while competing in product development. We think this distinction is also useful for thinking about how we might develop and apply agricultural systems models, while recognizing that there is also a competitive element among the re-searchers in the model development arena.

An Open, Pre-Competitive Space for Model and Platform Development Linked to a Competitive Space for Knowledge Product Development
Facilitating a pre-competitive environment is likely to require innovations in the way research organizations operate, and may need to involve public-private partnerships (PPPs). PPPs are one way that science and industry can collaborate to generate new applied knowledge that can feed into the creation of new business and services. In PPPs it is common that both private and public partners provide funding and jointly formulate the research questions that can subsequently be tackled by research institutes and universities. There are a number of challenges in structuring PPPs. For example, in the European Union PPPs have been regulated to avoid un-fair competition. The EU regulations stipulate that there always has to be more than one private partner involved and intellectual property rights of the knowledge developed (e.g., tools, models, articles, methods) belong to the research partner, which can then license the use to private partners for commercial purposes.

New Approaches to Data Acquisition, Management and Use
The explosion in the availability of many kinds of data and the capability to manage and use it creates new opportunities for systems modeling at farm and landscape scales. Figure 2 presents an example of the possible types of private and public data that could be generated and used for both farmlevel management (as in Use Cases 1, 4 and 5) and landscape-scale in-vestment and policy analysis (Use Cases 2, 3 and 5). Some of these data would be generated and used at the farm-level, others would be generated and used for landscape-scale analysis to support investment decision-making and science-based policy-making. Credibility, Uncertainty and Model Improvement A clear message from the NextGen Stakeholder Work-shop was that model credibility is a key issue limiting the use of models for decision-making. In some areas of commerce where long-term projections are import-ant, for example the insurance industry, there has been growing acceptance and use of quantitative climate models and impact assessment models. But for many decision-makers, ranging from farmers and agribusiness, to the development donor community and government, quantitative models remain an arcane and poorly understood part of science.
There are many aspects to establishing, maintaining and improving model credibility. First and foremost, models must be relevant to user"s information needs. In addition, the participants in the Stakeholder Workshop emphasized the need to communicate what models are, what they can and cannot do, and to quantify and communicate model uncertainty effectively so that users understand how to use model outputs. But besides being relevant to users" needs, models must perform well enough to be judged credible and useful. As the companion paper on The State of Agricultural Systems Science shows, there are many shortcomings of cur-rent models" capabilities that limit their relevance and usefulness for the Use Cases described here and the others discussed in the NextGen Stakeholder Workshop. Thus, achieving NextGen goals will involve developing better data and methods to evaluate model performance, both to help developers improve them and to help inform end-users about their validity and reliability.

Potential Advances in Model Components
We next present examples of potential improvements that are important and may be achievable in the disciplinary components of agricultural systems models. We begin with a set of cross-cutting issues that are common to all of the model components, and then focus on disciplinary themes.

Cross-Cutting Issues Representing and Incorporating Human Behavior into Agricultural Systems Models
Agricultural systems are managed by people for people. The objectives of the people using the in-formation generated by models, and the behavior of decision makers whose behavior is represented in models, must inflmodel design. Most existing models have a limited capability to represent economic or other behavioral motivations of decision makers. This is a cross-cutting theme in modeling because the management decisions made by farmers related to crop and livestock productivity as well as to economic costs and returns as well as environmental and social outcomes. There are several ways that behavior needs to be incorporated into NextGen models.

Representing Heterogeneity
A key fact that has emerged from the increasing avail-ability of field-and farmlevel data is the high degree of biological, physical, economic and social heterogeneity of agricultural systems, in both space and time. The farms represented by the use cases demonstrate this point: among smallholder maize-based farms in Kenya, for example, coefficients of variation of key characteristics like farm size are on the order of 100% or more; for commercial crop farms in the United States, they are also large, ranging from 50-150%. This heterogeneity has several important implications for how we represent agricultural systems in models: Accurate representation of biophysical processes (e.g., crop growth, chemical leaching, erosion, chemical runoff) requires site-specific data (i.e., soils, slope, weather, management).

Representing Dynamics
Agricultural systems are inherently dynamic. For example, crop growth occurs over time within the growing season, and crop productivity across growing seasons depends on crop rotations and other dynamics of the system.
Most bio-physical system component models (crop growth, livestock growth, environmental processes) are inherently dynamic, but can only represent heterogeneity to limited degrees. Economic behavior depends on expectations of future outcomes, and decisions are made sequentially, with information being acquired as decisions are made and realizations are observed.

Pathway and Scenario Design
Everything that influences an agricultural system, whether at the field, farm or regional scale, cannot be modeled. Consequently, most modeling is based on a logical structure in which some factors ("drivers", or exogenous variables) take on values specified by the modeler or the model user. How these drivers are set or modified to represent the conditions under which the analysis is being carried out is a key aspect of modeling that has been under-studied. The issue is now receiving more attention in climate research (cite Moss, SSPs), but needs to receive more attention from the model development community. In particular, if models are to be linked to end-users through knowledge products, the user needs to understand the con-text in which the analysis or "simulation experiment" is being conducted. There has been little attention paid to how end-users could define or select those conditions or assumptions in which the modeling is carried out. These issues relate directly to the considerations of relevance and credibility discussed above.

Crop Systems
Next steps in developing next-generation crop models fall into several categories: significant improvements in simulation of important crop processes and respons-es to stress; extension from simplified crop models to complex cropping systems models; and scaling up from site-based models to landscape, national, conti-nental, and global scales. Key crop processes that require quantum leaps in improvement Several crop processes require major advances in understanding and simulation capability in order to narrow uncertainties around how crops will respond to changing atmospheric conditions. Experimentalists and modelers need to work together from the outset to ensure that the right research questions are posed as experiments are planned, critical field data are gathered at appropriate times, and process-based understanding is captured so as to transfer new in-sights from the field to the crop models directly and expeditiously.
Developing predictive capacity that scales from genotype to phenotype is challenging due to bio-logical complexities associated with genetic controls, environmental effects, and interactions among plant growth and development processes. Crop model improvements are needed to link complex traits at gene network, organ, and whole plant levels. Phenotypes are linked to changes in genomic regions via associations with model coefficients (Hammer et al., 2006). Extension from 'crop models' to 'cropping system models' The field of crop modeling has been built on a single crop-by-crop approach. It is now time to create a new paradigm, moving from "crop" to "cropping system." Intercrops and complex rotations. A first step is to set up the simulation technology so that modelers can rapidly incorporate multiple crops within fields, and multiple crops over time. Then the response of these more complex cropping systems can be tested under different sustainable intensification management strategies utilizing the updated simulation environments. Similarly, studies can be performed to determine optimal cropping systems and management strategies for particular desired outcomes.
Pests, diseases, and weeds and their management. Diseases, pests, and weeds (DPW) are important yield-reducing factors in terms of food production and economic impact, and pose significant simulation challenges due to complex processes that occur over fine temporal but broad spatial scales. For each crop species, there is a portfolio of diseases, pests, and weeds, interacting over a range of time and space scales. Model improvements for DPWs include developing process-based models for important diseases and vectors, frameworks for coupling air-borne dis-eases to crop models, gathering significantly more data on crop impacts, and enabling the evaluation of pest management strategies. Scaling up from field scale to landscape scale Cropping system models need to be able to simulate easily a diverse set of farms rather than just one representative farm, as has been common practice in the past. There are several approaches for scaling up, including use of gridded models and development of simpler quasi-empirical models for landscape-scale analysis (Lobell and Burke 2010). Large-scale computation can allow for much more extensive use of gridded models than in the past (Elliott, Kelly, et al. 2014). Soils and climate input datasets become important as simulation goes from field to landscape scale. There are several types of dynamic process gridded crop models: those developed from the site-based mod-els such as DSSAT and APSIM; ecosystembased models; and dynamic land-surface models. An exam-ple of a more statistical model is the agroecological zone (AEZ) approach developed by IIASA and the FAO (Fischer et al. 2002).

Crop Model Interoperability and Improvement
A key question for the next generation of cropping sys-tem models is the degree of interoperability. Historical-ly, scientists (as individuals or groups) tended to have exposure to, and in-depth knowledge of, a single crop model (Thorburn et al. 2014). The Agricultural Mod-el Intercomparison and Improvement Project (AgMIP) aims to increase efficiency of model improvement and application by sharing information between different models and encouraging the use of multiple models in impact assessment (Rosenzweig et al., 2013). Ideal-ly, parameters from one crop model can be uploaded into databases and then downloaded, reformatted for use in another model. However, AgMIP has found that this sharing of parameter values between models is not necessarily straightforward.

Soils and Precision Management
Integrated agricultural technologies, defined as the integration of improved genetics, agronomic input, information technology, sensors, and intelligent machinery, will play a pivotal role in agriculture in the years to come. These innovations will be driven by economic forces, by the need to produce more food with limited land and water for the increasing population, and at the same time by the push to save resources to reduce the environmental impact associated with food production. While these changes are occurring now in the commercial-scale industrialized agricultures of the world, many of these technologies have the capability to be adapted to conditions in other parts of the world.

Pests and Diseases for Crops and Livestock
As noted above, a major limitation of existing models is how they represent pests and diseases. We expand here upon some of the important areas that must be improved in NextGen models.
Improved statistical modeling of within-season pest and disease threats using automated data collection and cloud computing. It is now possible to collect weather data continuously from ground-based sensors and to merge these data with medium-term weather forecasts and remote sensing data on crop growth and pest and disease damage. (Both growth and damage can be detected by satellite or drone by monitoring the crop"s spectral properties.) Then, using sophisticated statistical modeling done centrally, re-al-time advice can be distributed to farmers through the web or through mobile phones enabling them to take precautionary actions.

Livestock Production
There are a number of areas in which advances in live-stock modeling could improve the information needed to support the Use Cases identified in Box 1, for farm-level and landscape-scale decisions.

For farm-level decision support:
More comprehensive livestock models covering a wide diversity of ruminant species, adequately pre-parameterized for most common situations and with default values for users to parameterize models to their conditions. Summary models from comprehensive, dynamic models for on-farm support. This work includes summary models for intake, production and greenhouse gas emissions calculations. Some of these summary models could be developed as mobile phone technologies.
Development of extensive, standardized feed libraries linked to a GEO-WIKI for improving our mapping of feeds globally, but also to build a library that then can be used for deriving functions of feed quality for different agroecological conditions. One way this could be accomplished would be to expand existing household data collection protocols to include suitable data for livestock.
Livestock scenarios. Improved and consistent story-lines are required for the livestock sector in all scenarios. These storylines can be produced as part of global and regional "representative agricultural path-ways" being developed by AgMIP and other research teams. (Currently, such story-lines exist only for the global "shared socioeconomic pathways" used in climate impact assessments; see Havlik et al 2014; Herrero 2014.)

Pastures and Rangelands
Pastures and rangelands are integral to all livestock production systems and are often closely integrated with crop production systems (e.g., pasture in rotation). The biophysical components of these systems and driving data required to model them are largely similar to those of crop production systems (see first chapter), but management data tend to be sparsely available and representing continuity of plant populations is challenging. Advancing our ability to under-stand how grasslands are managedto understand, for example, what species are planted, what inputs (irrigation, fertilization, etc.) are provided, what grazing management (timing, intensity) is appliedis centrally important for improving our ability to model pasture and rangeland systems. At the same time, we have identified several features of next generation models necessary to improve the utility of models for pasture and rangeland systems, as we now discuss.
Areas in which advances in economic modeling could improve the information needed to support the Use Cases identified in Box 1 also correspond to farm-level and regional decision support.

Farm-level decision support
Advanced analytics need to be coupled with the data on management decisions that are becoming available through mobile technologies (e.g., tracking soil conditions, seeding and fertilizer application rates, pesticide applications) and their results (e.g., crop growth, yield). An example of this analytical capability is the AgTools software developed by several university extension programs, which allows managers to calculate short-term profitability and rates of return on long-term in-vestments (www.agtools.org).
Similar proprietary software tools are being developed and used.
These analytical tools could be linked with modules that track or predict environmental outcomes such as soil erosion and net greenhouse gas emissions (e.g., Ag-Balance by BASF). Low-bandwidth versions of these tools need to be developed for use in areas where mobile phone technology is a limiting factor. Analytical tools need to be adapted to fit small-holder systems. Environment and System Complexity Current agricultural system models typically operate at the point/field scales ( Fig. 4a) with an emphasis on vertical fluxes of energy, water, C, N and nutrients be-tween the atmosphere, plant and soil root zone continuum. A holistic upscaling from the point source to the landscape scale ( Fig. 4b) requires incorporation of several interacting, complex components, adding substantial complexity above and beyond the agricultural system itself. Thus, a major consideration in environmental modeling is how to best capture essential interactions while maintaining models that are feasible to implement with available data and computational resources.

Social Dimensions
As noted in section 3, a demand-driven approach is needed that begins with userselected outcomes. Various outcomes are of interest in the context of sustain-ability. Here we identify some key outcomes that need to be incorporated into modeling approaches.
Income distribution and poverty. Most economic models provide an estimate of some components of income, but a complete characterization of income sources is needed to evaluate income distribution and poverty. Population-level outcomes are needed, not only means or averages.