State Spaces for Agriculture: A Meta-systematic Design Automation Framework

Agriculture is a designed system with the largest areal footprint of any human activity. In some cases, the designs within agriculture emerged over thousands of years, such as the use of rows for the spatial organization of crops. In others, designs were deliberately chosen and implemented over decades as occurred during the Green Revolution. Currently, much work in agricultural science is focused on evaluating designs that could improve agriculture’s sustainability. However, approaches to agricultural system design are diverse and fragmented, relying on individual intuition and discipline-specific methods for how to meet stakeholders’ often semi-incompatible goals. This presents a risk that agricultural science will overlook non-obvious designs with large societal benefits. Here, we introduce a state space framework, a common approach from computer science, for agriculture to address the problem of proposing and evaluating designs computationally. This approach overcomes current limitations of agricultural system design by enabling a general set of computational abstractions to explore, and then select from, a much larger agricultural design set, which can then be empirically tested.

Existing approaches from computer science could be adapted to explore and evaluate a substantially larger number of potential designs, as has been achieved in other disciplines such as drug design, aerospace, and land use planning [7,8]. The need for a computer-automated design approach is particularly true for agricultural systems with extreme complexity and high levels of social, bioclimatic, and technological uncertainty [9]. In such a high-dimensional system, the universe of possible agricultural system configurations is impossible to explore exhaustively to find optimal outcomes, particularly with existing agricultural research techniques.
Here we describe one approach -state spaces -for describing and searching this universe of possible agricultural system designs, proposing familiar and yet-unimagined designs, and evaluating their outcomes. This approach, borrowed from computer science and complexity research, represents agricultural systems as states, inputs, and outputs; agents and forces that act upon these systems; and goals and objectives in a single commensurable way 1 . The approach can flexibly be applied across agriculturally relevant scales --from genes to ecosystems and days to decades --to inform challenges within most of the disciplines of agricultural science. We anticipate that this framework will aid proposing designs for empirical research that are resilient under high levels of uncertainty.
In this paper, we describe the state space framework and its components including state space representation, state transition functions, state transition accounting, and state evaluation, and how these building blocks can be used to propose and evaluate agricultural system designs. Then, we illustrate how the framework can be applied across cultivar development, cropping system agronomy, and within-season precision agriculture management.

STATE SPACES
A state of a system is a single possible configuration of that system (Box 1). The state space of an agricultural system is then the collection of all possible states of the system. 2 A state may represent, for example, a field at a moment in time or a given generation of a population under selection for breeding. In these examples, the state space represents the agricultural system under study -the cropping system or the breeding program. Critically, the state space includes states beyond those that have been observed, and is delimited only by what is biophysically possible 3 . All that is required is that a process exists by which states of the system can be encoded, even if hypothetical or unknown. Depending on the application, this can be finegrained (e.g., including details about individual plant root structures in soil), coarse-grained 1 building upon computer science concepts such as finite automata [10], state machines [11], program model checking [12] 2 A state space can be thought of as being a graph, where states are nodes and edges are transitions between states; transition functions define a set of transitions that occur. Summary functions perform aggregation of groups of nodes into single nodes in a coarser-grained version of the graph. 3 In order for state spaces to be computationally interesting, they must be computationally representable; any computational methods must have state representations to compute over. In most real-world systems, a difficulty exists: complete representation of all world states is typically impossible, and no particular scheme of representation is obviously correct or best.  e.g., including only a few environmental variables at the decadal scale), or somewhere in between. Importantly, the state space approach is agnostic with regard to how states are represented, requiring only that they are.
Consider a specific example of cropping systems design. The goal for a specific study could be to specify the most productive crop rotation for a geography. States can be defined at two different, nested spatial scales. The first is the landscape ( Figure 1A). The second is the field (Figure 1B). At the landscape scale, we show five states that are made up of the crop rotation states at the field scale; this is only a small subset of the possible landscape states. At the field scale, we can look historically and represent the state space as a Markov model, which describes both the possible crop choices and the likelihood of transition from one crop choice to another. The state space transitions in this representation of each field in this landscape are the likelihood of transitioning from corn to corn, corn to soybeans, soybeans to wheat, and so forth.

TRANSITION FUNCTIONS
Given suitably represented states, it becomes possible to define state transitions (i.e. mathematical functions) ( Figure 1C). These are simply the permissible (e.g. physically possible) transitions from one state to another. To the extent that such transitions are not idiosyncratic but follow predictable patterns, they can be represented as functions that map from an input state to an output state. 4 In the language of graph theory, if each state is a node in a graph, then edges represent potential transitions. For an agricultural system, we might have a transition function representing the effects of irrigation (taking a set of drier states to wetter ones), fertilization (taking a set of states with lower nutrients to higher), crop selection, and so on. Transition functions are crucial to understand which state transitions are considered possible and to represent biophysical processes computationally ( Figure 1D). 5

SOFTWARE SYSTEM MODULARITY
The framework enables the transformation of agricultural system designs, an infinite state space, into tractable and practical summaries that can be operated upon by well-understood agent and algorithmic techniques ( Figure 2). Many of these summarization and transition functions have long been implicit in the agricultural sciences, and through this framework we make them explicit: a summary function could simply encode and combine the human-level understanding of the states of an agricultural system (for example, the spatial arrangement of a farm and the sequence of its crops at a timepoint during a growing season) with a lower-level representation such as a space-time cube of the land and its constituent parts. Such an approach to encoding low-level state representations in summarized forms has been the basis for success in the applications of artificial intelligence to general game playing problems as well as the success of general purpose language models [7]. This is because the output of a summary function is simply another, more coarse grained state space, upon which transition functions (between these higher-level states) can be defined ( Figure 2). Thus, summary functions enable aggregation towards coarser representations that are more computationally tractable. 6 There are four benefits to this summarization approach. First we make explicit the summarization and categorization of agricultural systems that is and always has been taking place [13]. Second, by defining summarization functions and their output state spaces, we produce reusable high-level representations that enable modularity of agents that act upon those representations. Third, higher-level summary representations of state spaces are often more understandable by scientists and practitioners, enabling faster improvement and verification of the results of the design system. Fourth, much of the low-level representation of a state space is likely irrelevant for some specific context, and retaining that low-level state representation will result in computational intractability; summarization enables agents to act upon solely the components that are relevant for their decision-making, lowering computational complexity ( Figure 2). As we progress towards more simplified and coarse state spaces, the framework makes it possible for existing techniques in the computer science literature, whether Reinforcement Learning, Random Forest Decision Trees, or even more deterministic techniques, to act upon the state representation to explore the transition between states given some high-level objective. Regardless of technique, these form the basis of state spaces required for design automation.

DESIGNING AGRICULTURAL SYSTEMS WITH STATE SPACES
The last component required for design automation using state spaces is the creation of a design agent ( Figure 1D). There are many prior definitions of design agent; we define a design agent as any program that has the ability to execute a transition function in order to explore the designs within a state space. There are many ways to operationalize this definition of an agent, from one that executes transitions in a state space randomly to one that does so deliberately using utility theory [14].
For example, an agent may be used to design nitrogen application rates and balance the tradeoff between nitrogen losses and grain yield, where larger nitrogen fertilizer applications increase yield with diminishing returns and rapidly increase nitrogen losses that degrade water quality [15]. Here, a state is the field given a specific amount of nitrogen applied, the cost of transition is the cost of increasing or reducing nitrogen application, and the benefit is some weighting between water quality and grain yield outcomes. A utility-based design agent could design new states (e.g. combinations of fertilizer rate, fertilizer form, resulting crop yield, and water quality), estimate the likelihood of reaching these states (e.g. subject to variable weather, application timing, etc), and the costs of transition and resulting benefits (e.g. new equipment, different form, etc), ultimately predicting high-value nitrogen rates and outcomes across a landscape. This utility agent is only one potential agent though. One could instead employ a random design agent to approximate the range of possible nitrogen losses and yields based on simulated nitrogen rate configurations and thus provide a benchmark for the current state of the agricultural system.
Critically, high accuracy forecasts of agricultural outcomes are not necessary for successful application of the state space framework and design agent. There are two reasons for this. First, it's possible to create agents that generate appealing and useful designs without any notion of how the world works as is the case across many large language models and game playing agents. Agents could be instead trained by observing humans engage in design or by designing against themselves. Secondly, in cases where accurate biophysical prediction matters, the emphasis for a successful agent is primarily with transitions among states. In this second, explainable approach to building a design agent, a major research goal will be developing "world models" to estimate the likelihoods of state outcomes given a transition function. For example, to estimate the likelihood of yield and water quality outcomes a design agent does not need exact point predictions of a state to evaluate potential new states and transition functions, only accurate likelihood estimates that a state can be reached given a transition function and a value if it were to be reached. Many methods will produce such likelihood estimates. Most rely on Monte Carlo simulation with varying input parameters for models. This model requirement punctuates the need for general purpose and scalable models of agricultural systems that are parameterized automatically for different geographies and design objectives. This ability of design agents to handle uncertainty is a major feature of the approach -if the outcomes of transitions between states are highly uncertain, so long as there is an accurate evaluation of the likelihood of reaching each state and a value of the target state, the state space and design agent will provide robust design recommendations.

Example Applications
We describe three potential application areas. For each, we describe how agricultural scientists currently approach the design problem within each area. Then, we describe how the problem area maps into the state space framework. Finally, we end each with a description of what insights the state space framework provides.

Description of the Design Problem
Plant breeders make selections on genetic variation for target traits. Consider new cultivar development for biotic or abiotic stress tolerance. First, a source of tolerance must be identified, which can require screening hundreds or thousands of accessions. Next, experimental populations are developed, which result in tens of thousands of progeny. Crosses with tolerant parents may cause a population to initially fall in performance, which has a somewhat implicit, well understood definition that we aim to make explicit, requiring several generations to recover good yield and quality characteristics [16,17]. Progress depends on the size of the population, selection intensity, and genetic variability for the target trait. Current strategies for breeding include multi-environment trials to sample target populations of environments [18], genomic prediction modeling [19], and speed breeding [20].

State Space Description of the Problem
The transition function for the state space is one cycle of selection and each generation can be represented as a summarized state of individual plants. Drawing from concepts from evolutionary biology, each generation can be evaluated by a design agent based on its location on a fitness landscape [21]. The ability to transition between states, or traverse the fitness landscape, depends on the genetic features of the trait (trait architecture and heritability), the characteristics of the species (e.g. the mating system: clonal, outcrossing, selfing), and the current state of the population (where it is located on the landscape). As the design agent calls the transition function to move the population through state space, the ability of the design agent to recover specific desired properties depends on the outputs associated with various state transitions.

Novel Insights into the Problem Area from State Spaces
Breeders in many respects have been using methods that are akin to state spaces to optimize selection of new material [22] 7 . However, explicitly framing crop breeding using a state space approach has the potential to overcome the longstanding challenge of interoperability with other sub-disciplines in the agricultural sciences (e.g., cropping systems design), conditional that they are also framed using state spaces. By taking the state space perspective, this interdisciplinary interfacing can readily make use of emerging discipline-specific approaches (e.g., genomics and physiological models for crop prediction [23,24]) to operate as transition functions that map one state to another. As more comprehensive understanding emerges about the underlying processes of crop physiology and genetics, current and future genome to phenome models can be readily swapped in and out of this framework.

Cropping Systems Description of the Design Problem
A preliminary evaluation of a new cropping system design requires a minimum of three site years; a longer study is necessary to address emerging challenges including resilience to climate variation. An exceptionally large number of potential cropping system designs emerge given species, cultivars, and management choices available in a single environment.

State space description of the problem
A current long-term research program is adapting cropping systems to increasingly erratic weather in the western US corn belt, which is currently dominated by the corn-soybean rotation [25]. This experiment is testing five annual crop rotations of up to 4 years in length with a subset of seven annual crops using locally common management. However, a total of 721 rotations of up to 4 years are possible. While some of these may be more favorable than the five currently being studied, studying them all is infeasible for a long-term experiment to capture sufficient weather variation.

Novel Insights into the Problem Area from State Spaces
The state space framework can guide the proposal and evaluation of unstudied rotations. One might combine crop models with soil physical and nutrient models to infer crop and soil outcomes based on weather and known or inferred rotational effects in order to establish the state transition functions [26]. Management practices, including planting date, fertilizer inputs, or the addition of cover crops or intercropped forage legumes might also be varied using data from nearby experiments [27]. Transition functions would account for estimates of yield, inputs, and changes in soil properties from each transition across single and multiple years in a rotation. Based on these outcomes, the design agent may select favorable (high expected value) transitions for each possible subsequent state based on the current crop state, resulting in the identification of locally adapted n-year rotations without exhaustively modeling each of the 98 possible 1-4 year rotations. Agricultural scientists could then initiate empirical study of the best performing rotations.

Parallels to tropical multi-species mixtures
A similar approach might be taken to study perennial multi-species mixtures in the tropics, such as intercropped and integrated coconut-cacao-animal systems [28]. However, it is unclear which plant and animal combinations would achieve the desired goals of stakeholders. Using crop growth models, the design agent would evaluate the summarized outcomes at the end of establishment, biannually for coconut and breadfruit (which are planted in orchards together), and every few months for chickens; management of both trees and animals might be selected for highly favorable outcomes based on possibilities at the current state. Multiple long-term simulations would allow selection of favorable starting tree configurations, species, or varieties based on management goals and decision horizons.

Description of the Design Problem
Many decisions alter the growth and development of crops during the season, including planting date, tillage, nutrient inputs, pesticide use, grazing and harvest date. The combinations of species and management within a season leads to an exceedingly large state space.

State Space Description of the Problem
For example, consider an intercropping agrovoltaic system where there is a cover crop under the solar panels and vegetable production in the rows between panels. To set up a viable production farm, there may be a need to test 3 cover crops, 2 animal species (e.g., for grazing cover crops), 10 vegetables with 5 cultivars of each vegetable and 2 planting dates, 4 harvest times and 3 different pest management scenarios, for a total of 7200 possible states in a single location [29]. This amount of empirical testing is not feasible to identify the optimum for even a single location.

Novel Insights into the Problem Area from State Spaces
There are a few scaled agrivoltaic production systems, but there are comparable agroforestry systems. From the perspective of the design agent there may be little difference in the photosynthetic activity from a tree or solar panel canopy, so new systems can be assessed and outcomes inferred without having actually been empirically tested. This demonstrates how abstractions that allow for modular thinking can limit the number of combinations that need to be tested in a given context or to reimagine what combinations can be used. By identifying the management that enables favorable transitions among states (Figure 1), design agent-selected choices can guide real world testing of solutions that are likely practical and will meet the needs of the researcher.

Conclusion
Agriculture has served as the foundation of human civilization across cultures, resulting in a rich array of system designs. The state space framework outlined here enables the automated design of agricultural systems to explore that full breadth and beyond. Design agents search agricultural state spaces in order to identify systems that can meet the changing demands of an uncertain future. The practical implementation of an automated design system requires modularity in both the conceptualization of agricultural systems (e.g. individual-based models based on biophysical principles) and the software components conceptualized used to define states. The practical next steps of implementing such a system will require rapid proposal and disposal of many sub-modules to work toward automated design. Thus, we likely need to move the field toward deliberate consideration of abstractions that compose cleanly and enable modularity, and one where we can iterate on the individual contributions within sub-fields without siloing that knowledge. In this way, agricultural scientists may need to think more like computer scientists, holding abstractions and representations more loosely.
[A]bstraction is a quintessential activity of computer science-the intellectual tool that allows computer scientists to express their understanding of a problem, manage complexity, and select the level of detail and degree of generality they need at the moment. Computer scientists create and discard abstractions as freely as engineers and architects create and discard design sketches [30].
The result may be that instead of relying on the intuition of individual scientists and long-held abstractions to generate new designs, the design of agricultural systems may be made more resilient in the face of uncertainty through: 1. Formalizing the intuition of expert scientists for what constitutes a resilient agricultural system to establish goals for automated design agents, 2. Facilitating the borrowing and integration of modularized knowledge across disciplines by providing a common language of state spaces, aiding multidisciplinary research, and 3. Accelerating innovation by generating computer-aided design systems that can infer novel agricultural configurations with a high likelihood of societal benefit allowing us to make the most of scarce time, space, and money in empirically evaluating new agricultural system designs.
In this way, the State Spaces for Agriculture framework we propose is about the formalization of a computational imagination that provides a flexible and general approach to conceptualizing digital agriculture research to motivate and support empirical research and development on the most promising of designs in an uncertain world.

Box 1. Understanding State Spaces. A)
A discrete example of state transitions of a chess board, specifically, the opening sequence known as the Queen's Gambit. This sequence provides a known example of a series of states that provides an advantage to a specific player. This abstraction of states, state spaces, and state transitions that advantage a scenario can be used when exploring any system, such as an agroecological system. B) Identifying a route between two different cities. Tools such as Google Maps leverage the concept of state spaces by having accounting functions for the value (cost) of any given road segment, making use of edge weights and an agent/algorithm to identify a reasonable path between locations, where the agent/algorithm operates on an (internal) summary of the underlying ground truth from Google's mapping infrastructure (e.g. Street View cars) and public data. A reasonable path is typically intuitively defined by a person, but the goal of defining the transitions is to make the implicit assumptions explicit and enable generalizability of both representations and the agents/algorithms that act upon them. In this analogy, cities/locations are states and the state space includes all cities/locations in the region; the transition functions are road segments from one city to another. It is sometimes the case that a desirable result of state transitions (e.g. arriving at Darwin) will go through undesirable intermediate states (e.g. The Outback).

Figure 1.
A) Simulated landscape states each year contains nine landscapes which contain either wheat (orange), soybean (green) or corn (yellow) these states change between years B) Each field change between years can be represented as a count, a probability and weighted edge graph C) Using accounting hooks the value of each transition can be accounted for helping to explain the probability of change D) A design agent can explore the transition history and then identify potential states and move between states of different probabilities to create new configurations. Here a multi-stage transformation of an infinite complexity agricultural state space is transferred into practical summaries that can be operationalized using well understood agent techniques. Transition functions apply within a single level of representation, describing the pathways for transition from one state to another (and can have associated costs that are accounted for). The data can be of any level of complexity (e.g. mapping layers, images, yield, ecosystem services). Summarization enables agents to act upon solely the components that are relevant for their decision-making, which is described by the user. This flexible framework allows for modularity for use cases that are of interest to any scientist.