Cooperatively improving tallgrass prairie with adaptive management

Adaptive management (AM) is widely used as an approach for learning to improve resource management, but successful AM projects remain relatively uncommon, with few documented examples applied by natural resource management agencies. We used AM to provide insights into actions that would be most beneficial for the management of native tallgrass prairie plant communities in western Minnesota and eastern North and South Dakota, USA. After 9 yr of data collection and learning, we report on whether the condition of the prairie improved with management and which actions and frequency of action allowed improvement. Our approach to AM employed Bayesian inference to generate annual management recommendations at siteand state-dependent scales. We also used a logistic regression approach to complement the output from the AM model and evaluate the more general conditions that led to attaining management goals. Overall, the cover of native plants increased for low-quality sites, and among the management practices considered, we found that burning most effectively enhanced the native prairie plant community and increased the dominance of native indicator species. Contrary to expectations, the results also indicate that grazing on sites that started in a poor condition was less likely to show improvements in the native plant community. Complementing AM with more traditional statistical analyses can help inform the iterative double-loop learning phase of the AM framework. Adaptive management has many challenges, but we demonstrate that multi-agency AM can be successful. Keys to success include starting the project with an in-person, in-depth workshop; standardized protocols and a centralized database; a core project team with multi-disciplinary backgrounds; stability in project leadership; and regular communication to meet annual deadlines.


INTRODUCTION
Natural resource managers face a host of uncertainties regarding the best management practices to achieve their resource objectives. The peer-reviewed literature that addresses land management challenges, especially climate change and invasive species, often recommends the use of adaptive management (AM) as an effective means to address uncertainty and improve the effectiveness of management decisions (Walters 1986, Knutson and Heglund 2012. However, despite a large body of literature calling for and describing AM, there are few documented examples of AM fully implemented by a management agency, including essential components like iterative monitoring, updating of models, and reporting that includes management recommendations informed by the monitoring (Rist et al. 2012, Westgate et al. 2013.
The current state of the North American grassland system and the process of grassland management lend themselves well to an AM approach. In North America, native prairie has suffered dramatic habitat loss, fragmentation, and degradation (Samson et al. 2004, Wright and Wimberly 2013, Lark et al. 2015. In recent years, the native plant diversity of the northern prairies has suffered at the expense of invasive plant species, including woody encroachment (Grant and Murphy 2005), Kentucky bluegrass (Poa pratensis), and smooth brome (Bromus inermis; Grant et al. 2009, DeKeyser et al. 2013, Toledo et al. 2014, DeKeyser et al. 2015. Land management agencies struggling with these issues could benefit from learning from management. Therefore, we employed AM to guide management actions with objectives to increase the cover of native vegetation, increase floristic diversity and, therefore, create structural diversity on our remnant prairies. Adaptive management can be used when five criteria of natural resource management are met (Moore et al. 2011b): (1) A decision is repeated in time or space, (2) there are a discrete set of choices to choose from, (3) there is structural uncertainty that can be described with competing responses to each action, (4) a value system for the outcome of the action is clearly articulated, and (5) a monitoring system exists to compare an observed response to actions to model predictions. We considered the problem of prairie management to be an ideal case for AM because it met three of the criteria above, and we used the AM framework to add the final two. First, the problem is an ongoing issue shared by nearly all resource managers across the northern prairies. Native prairie once dominated this landscape, but only small, often invaded, remnants remain due to agricultural intensification and, more recently, energy development (Kuvlesky et al. 2007, McDonald et al. 2009, Pruett et al. 2009, Northrup and Wittemyer 2013. Second, the management alternatives for addressing the problem of increasing dominance of invasive plant species are finite (e.g., burning, grazing, rest) and often time and resource intensive. Indeed, many field station managers can only employ active management on a small subset of their management units, due to limited staffing and budgets, but there is a strong desire to employ these limited resources where they will most effectively achieve management objectives. Finally, it is difficult or impossible for a single manager to reduce the uncertainty about the effectiveness of management on plant communities by working independently. It can take many years for ecosystems to change in response to management. Furthermore, staff turnover is common, and individual learning is often truncated by short tenures and limited documentation. Even with long tenure, management of a small number of units in a restricted geographic area cannot reveal general patterns of ecosystem responses to management employed across a range of soil types and geographies. The power of replication is gained when managers collaborate by standardizing both management actions and monitoring protocols.
We focused on a major uncertainty regarding management of native grassland ecosystems common to resource managers in the northern tallgrass prairie, whether the condition of native plant communities improves with management action, and which management actions and frequency of management actions lead to improvement. Fire and grazing are both historical disturbances and current management techniques employed by land managers across the region (Ryan 1990, Collins 1992, Kral et al. 2018. Despite the history of these disturbances in the northern tallgrass prairie, most of our understanding about the effects of fire and grazing in the tallgrass prairie has come from the central or southern Great Plains (Engle and Bidwell 2001, Fuhlendorf and Engle 2004, Kitchen et al. 2009). Recent studies have started to demonstrate the benefits of fire in the northern prairies (Kral et al. 2018), but managers still face a lot of uncertainty about when and how often to use fire, grazing, or rest to improve the condition of the prairie. The scope and scale of this problem brought managers from multiple agencies (e.g., Minnesota Department of Natural Resources, The Nature Conservancy, and the U.S. Fish and Wildlife Service) together to address the uncertainty surrounding the influence of management on remnant prairie plant communities, and we engaged AM advisors and database experts to help set up the project and to develop tools that included annual model updates and reporting based on current and previously collected data (Hunt et al. 2015(Hunt et al. , 2016. The AM framework was developed using a structured decision-making (SDM) process (Lyons et al. 2008, Reynolds et al. 2016. Structured decision-making has emerged in the field of natural resource management to help managers make decisions when faced with complex ecological systems and management problems (Lyons et al. 2008), and AM is a version of SDM where the decision, actions, and consequences are iterated over time allowing for the ability to learn about the decision. In this case, the SDM process involved conservation practitioners from multiple agencies and at many levels of natural resource decision making. As a group, we defined the problem and management objectives and alternatives and developed a set of competing models that represent the potential responses of management units to actions.
Given the paucity of on-the-ground examples of AM projects in the peer-reviewed literature and the continued confusion within the natural resource management community about how to employ AM (Rist et al. 2012, Knutson et al. 2017, the goals of this paper are to describe a successful multi-agency AM project and to summarize initial results from the AM model and a logistic regression model that assesses which management actions, if any, were improving the tallgrass prairie. We considered this AM project successful based on three metrics: (1) the information provided to land managers to support their site-level management decisions annually, (2) the overall learning about the effects of management on prairie, and (3) the continued commitment and collaboration among the project partners. We summarize the lessons learned to assist others attempting to implement a large-scale AM project.

STUDY AREA
The geographic scope of this project included the tallgrass portions of western Minnesota and eastern North and South Dakota (Fig. 1). About 90% of the sites were in Minnesota, including the entire north-south gradient of the state. The mean annual temperature and precipitation for the region range from 3.2°C and 47.2 cm in the north to 7.8°C and 74.9 cm in the south (based on data range 1981-2010; https://www.usclima tedata.com/climate/minnesota/united-states/ 3193). Sites span two Ecological Provinces: the Tallgrass Aspen Parklands to the north and the Prairie Parklands in the south and west (Bailey et al. 1994, Cleland et al. 1997. The sites were all unplowed (remnant) prairies. Prairie community types included Wet Prairie, Mesic Prairie, and Dry Prairie (MNDNR 2005). Dominant vegetation varied by community type and included big bluestem (Andropogon gerardii), yellow Indian-grass (Sorghastrum nutans), prairie cordgrass (Spartina pectinata), and prairie dropseed (Sporobolus heterolepis) in wet-to-mesic prairie and little bluestem (Schizachyrium scoparium) and porcupinegrass (Hesperostipa spartea) in dry prairie (MNDNR 2005). The most common invasive species were non-native cool-season grasses, Canada thistle (Cirsium arvense), white sweetclover (Melilotus alba), and yellow sweetclover (Melilotus officinalis; Appendix S1). Study sites included only unplowed prairie, but the condition of the sites ranged from having a high diversity of native plant species and a low abundance of invasive plant species to those dominated by invasive species with relatively few native species.

Adaptive management framework
We began the SDM process to establish the framework for the AM project with an interagency team in 2007 (Lyons et al. 2008, Reynolds et al. 2016. We went through the steps of the SDM process. We developed a standardized protocol to monitor responses, defined discrete system states, and created a centralized database to house the collective data across agencies. After a pilot year in 2007, the full AM project began in 2008 with data collection through 2016. Objectives.-The main objective was to maximize the quality of prairie by considering various management alternatives. Although all participating agencies have different mandates, the team agreed that the quality of prairie was defined by three sub-objectives: (1) to maximize the percent cover of native prairie vegetation, (2) to maximize the floristic diversity of native prairie ecosystems, and (3) to maximize the structural diversity of native prairie ecosystems.
Adaptive management cycle and management alternatives.-The AM cycle was implemented at a management unit level, which was defined as a contiguous block of unplowed prairie that experiences a consistent management scenario. For this project, all units were managed for conservation purposes.
The annual management cycle for the project ran from 1 October through 30 September (Appendix S3: Table S1). We used a 3-yr time step for each management unit because we were interested in both the type of treatment and the frequency with which treatments were employed. With an annual time step, we could not evaluate the influence of the frequency of management actions through time. 30 September was the end date for our management year because that was the latest the plant communities could reliably be monitored after the growing season. Fall actions were captured during the next 3-yr period.
We evaluated four management practices and two categories of management frequency. The management practices evaluated by the model were burning, grazing, the combination of burning and grazing, and rest, and we set limits on the parameters for each of these practices. A burn during any time in the management year (1 October-30 September) qualifies if ≥50% of the vegetation in the unit was exposed to fire. A unit was considered grazed if ≥25% of the area or biomass  ❖ www.esajournals.org in the unit was affected by livestock. A combination of burn and graze was defined as both actions occurring within one management year (e.g., patch burn grazing; Fuhlendorf and Engle 2001). Rest occurred when no management actions were taken on that unit in that management year. We also evaluated two levels of management frequency over the 3-yr management cycle, low and high. A low-frequency treatment was defined as one or no non-rest management practices applied over the 3-yr time step. A highfrequency treatment was defined as two or more non-rest management practices applied over the 3-yr time step.
Field monitoring methods.-The field protocol to monitor outcomes was designed to evaluate which management alternatives and frequency of actions best achieved the objectives. Sample units were permanent transects, distributed randomly within a management unit at a density of one transect per 10 acres, with a minimum of five and a maximum of 15. Sites <50 acres in size had five transects, and sites larger than 150 acres had 15 transects. This rule was made after a power analysis of the data in 2012, and sites with higher-than-necessary transect densities were adjusted. Transects were 25 m long, 0.1 m wide, and subdivided into fifty 0.5 m long plots (Grant et al. 2004).
The vegetation attributes monitored included structure, functional composition, and plant species. The structural measurements included a visual obstruction reading (VOR) and litter depth to the nearest cm. One VOR was taken with a Robel pole at the center of each transect (Robel et al. 1970), and litter depth was measured every 5 m. Functional composition was evaluated for each 0.5-m plot along the transect based on the cover of native or invasive species, shrubby or herbaceous plants, and grass vs. forb mixture (Appendix S3: Fig. S1). Plant species composition was based on an indicator list of native and invasive species developed for the project and measured at two scales along the transect (Appendix S1). At the 0.5-m plot level, each invasive species present was recorded, and in a 3 m wide belt along the transect, the presence of any native or invasive indicator on the list was recorded.
System states. -We chose three categorical variables to define the overall state and quality of the grassland: the percent native cover, the proportion of native indicators present, and woody vs.
herbaceous functional group to create 20 overall states (Table 1). We collapsed the higher proportions of native indicators if the native cover was <50% because the number of native indicators was low when natives were not dominant. The lower state numbers represent highly invaded, lower quality prairie, and the higher state numbers represent more native-dominated, higher quality prairie.
Defining knowledge of the system.-We framed the problem by developing two competing models; one model assumed that the action taken improved the system, as defined by changes in state, while the other model assumed the action degraded the system. Our belief in each of these competing models was represented by a model weight for each model, varying between 0 and 1 and summing to 1. We represented the two competing models as two separate state and transition models, which described how each state would respond to each management action. Using Bayesian inference, the monitoring data were used to determine which of the competing models best reflected the observed data.
For each state, we were uncertain about the expected likelihood of transitioning from one state to another because of management. Ultimately, each combination of starting state, ending state, and action received a probability that fell between zero and one, such that the sum of probabilities of transition from a single state to all other states was one. At the start of the project, we used pilot data to look at 3-yr transition rates for native cover, functional group cover, and proportion of native indicators. We used these transition rates to create reasonable transitions based on the product of each of the three components. These were the starting rates that we used to develop a Bayesian updating routine, in which we compare observed data with these competing models (see Appendix S3 for more details).
Defining the value system.-A clearly stated value system is a requirement of AM. Here, this value was based on two aspects of the predicted ecological outcomes from the state and transition model described above: (1) the outcome state, such that higher quality grassland was preferred, and (2) whether the action led to improvement or degradation. The management team worked together to define the relative values associated with transitions (Appendix S3). The worst outcome occurred when the action caused degradation from the most desirable state to the least desirable state, and the management team decided that the best outcome occurred when an action resulted in any state achieving the most desirable state.
Integrating biological knowledge and values. -Once the state and transition model was parameterized, we used Bayesian updating to incorporate the results of the monitoring data. We grouped together all observations with the same starting state and management history. Because we wanted to learn about the influence of both the frequency and type of action, we needed a way to learn about both factors from each individual management history. To do this, we developed a similarity index that described what information one action told us about a similar action. For example, if a rest-burn-rest is performed over the 3-yr time step, that provided information about both low-frequency actions and rest and burn individually. We included a parameter in the Bayesian inference step that allowed us to use the insights gained from one action and apply it to all similar actions (Appendix S2).
For all observations with the same starting state and action, we used Bayesian inference to update the model weights. To minimize the effect of correlated observations, we averaged the updated weights across all observations (Gannon et al., 2010(Gannon et al., , 2013. At the outset of the project, we assumed there was complete uncertainty for each action and for each state. Therefore, the confidence any action taken in any state caused improvement was 50%. We used Bayes' theorem (Bayes 1764) to update our confidence the action taken led to improvement: here, p(A I ) was the prior confidence the action (A) taken was one that improved the state, and p (O A |A I ) was a conditional probability from the upper bound probability matrix that reflected the probability the observed transition resulted from an action that led to improvement. Similarly, p(A D ) was the prior confidence the action taken was one that degraded the state, and p(O A | A D ) was a conditional probability from the lower bound probability matrix that reflected the probability the observed transition resulted from an action that led to degradation. We incorporated the similarity of action i with action j, SA ij , into the upper and lower bound probabilities. To determine the similarity of each 3-yr action with all other actions, we first grouped the actions into disturbance frequency (low or high), and type of action (graze; burn; within-year combination of graze, burn, and rest; and between-year combination of graze and burn and rest). For example, the burn-graze-rest action (BGR) is a high-frequency action, has 1 yr each of burn only and graze only, and is a combination among years. The project team worked together, so classifications were done with expert judgment (Appendix S2). We determine the difference among the two categories by calculating the sum of the absolute differences among the actions. Difference between actions i and j, DA ij , is: where the first term represents the average difference in frequency, F, and the second term represents the average difference in the type of action, T. The numerator in both terms represents the number of categories within each action group. We standardized the differences such that the largest difference is one and smallest difference is zero, and then subtracted the difference from 1 to create the similarity of action i with action j, SA ij (Appendix S2): We used the similarity index to modify the expected improvement such that as similarity decreased, the expected transition probability shifted from upper bound to lower bound such that the updating of all other j actions is represented the updated likelihood that action j led to improvement, pðA j I Þ given action i was taken resulting in observation O A .
The final step was the integration of knowledge with values, and here, we use stochastic dynamic programming (SDP). We expressed the transition probability from state s at time t to state j at time t + 1 under action a as p(j t+1 |s t , a t ) and the utility or reward as r(s t , a t , j t+1 ). The goal of SDP was to find the path of decisions and states through time that produced the greatest expected cumulative reward from the system (i.e., creation of ideal grassland habitat). Under SDP, the optimization was prospective, meaning if circumstances took the manager from this optimal path, the new recommended sequence of decisions from that point forward was itself optimal.
The SDP solution for the Markov decision process was obtained through the iterative application of the recurrence relationship (Bellman 1957), d refers to the sequences of decisions over a defined time frame, which is 3 yr for this case. In other words, the value V of being in state s at time t was a sum of two parts: the expected reward over all possible transitions out of state s in the next time step (t to t + 1), plus the expected future value derived at time t + 1 onward, V(j t+1 ). The first term was the product of the transition probability and reward for transitioning to state j at time t + 1 given starting state s and the action a taken at time t. The SDP was used to find the sequence of decisions d for which this expression was maximized.
Outputs.-The AM framework employed provided two types of output on an annual basis.
The first was a state-based 3-yr management action recommendation for every site monitored that year. States were assigned and analyzed in the model at the transect level, and recommendations were given by state. Most management units were variable in condition across the unit, and therefore, multiple states could be present within a unit. The different states within the unit could have different management action recommendations from the model. Because management happened at the scale of the unit and not the transect, the recommendations used for the unit were based on the state with the greatest representation in the unit. The second type of output provided was the updated confidence in each competing model given each potential action taken. These updated confidence values then guided which actions and what frequency of action taken would best improve the prairie condition for each of the 20 states.

Generalized linear mixed model approach
To compare the results of the AM model with a more traditional frequentist approach, we used a generalized linear mixed model with the same dataset to evaluate the effects of management alternatives on the percent native cover and the proportion of native indicators along the transects monitored. For this analysis, rather than treating the combination treatment separately as a different management action, it was treated as an instance of both a burning and a grazing alternative. All transects across all years and their management histories were included in this analysis, while the Bayesian model above used a subset of the transects that were monitored in a given year.
The percent native cover and proportion of native indicators were the response variables. While these component metrics, along with woody vs. herbaceous functional group, collectively determined the transect state in the state and transition model, we analyzed them separately because they were more ordinal than the states. We recoded these component metrics as binary variables in terms of whether or not a management goal was met (Appendix S3: Table S2). We chose to convert the response variables to a binary response rather than use the absolute observed change because management goals were different depending on the initial condition. For example, for units that begin in a high-quality state, the goal was to maintain that state (i.e., no change), whereas for sites that were initially in a low-quality state, the goal was improvement to a higher quality state (i.e., increase). Therefore, we chose to construct response variables for the models that reflected whether or not the management goal was achieved rather than the absolute movement between categories. The recoding to binary variables was done based on the change between the initial measurement and the final measurement, regardless of how many times a transect was measured.
To model the change in percent native cover and proportion of native indicators along transects, we fit a generalized linear mixed-effects model. We calculated the amount of grazing and burning activity as the number of times each action was recorded, divided by the total number of years between 2008 and the final year of observation for each transect. This allowed us to standardize the amount of grazing or burning activity for each transect regardless of when or how often observation data were collected.
We included the transect state at the time of the first monitoring as a predictor in the model because visual inspection of the data implied that the condition of the transect at the beginning of the project was a strong predictor of goal achievement. We also included a random model effect for the management unit.
We tested models that included interactions between the initial state and each of the management alternatives as well as interactions between the management alternatives themselves. We also included a model that collapsed the grazing and burning variables into a single predictor that indicated the proportion of years that either burning or grazing took place. All model configurations were run using the lme4 package in R (Bates et al., 2015, R Core Team 2017. All models were compared using Akaike's information criterion (AIC; Akaike 1974) to find the best-fitting model.

Condition analysis
To evaluate how the condition of remnant prairies changed over the 9-yr period (2008-2016), we converted the 20 prairie states to a score ranging from 0 to 100. Because prairie ❖ www.esajournals.org states were categorical, this score transition created a linear measure for more interpretable analyses and summaries. The three components used to define prairie state (percent native cover; woody vs. herbaceous functional group-hereafter functional group; and proportion of native indicators) were converted to individual scores, ranging from 0 to 33.3, and summed for the final score (Table 1). Scores for percent native cover and functional group were based on three categorical levels, while we used raw values of the proportion of native indicators based on the ratio of the number of indicators in a given transect relative to the transect with the highest number of indicators (13) recorded in the entire dataset.
Mean values of overall scores, as well as scores for the three individual components, were calculated for all transects that had been monitored for at least 6 yr (a total of 434 transects in 80 management units). We compared mean values of starting scores (start) and ending scores (end) to assess the change in condition for all transects. We also evaluated change in condition between transects with low starting states (0-10) and those with high starting states (11-20).

RESULTS
Over the 9 yr of data included in this analysis (2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016), there were 765 unique transects from 130 management units located across 53 sites and covering over 4650 ha in western Minnesota and the eastern Dakotas. Project participants often staggered the monitoring of the management units included in the project over the 3-yr interval to even the annual workload, and participation varied over the years, with some land managers leaving their positions and new ones joining the project. Therefore, the number of transects and units included in the model varied over time (Table 2), but many transects were included for more than one 3-yr time step. Because the project started in 2008, the number of units and transects added to the model was greatest in 2008 and many of the same sites were monitored again in 2011. The first year of AM model results was 2011, with annual updates since that time.
Since 2008, we have had good representation for most of our management alternative combinations (Table 3). The most common management combination applied was 3 yr of rest (304 transects) followed closely by 1 yr of burning and 2 yr of rest (279 transects). However, combinations that included grazing were also frequently used. The only combinations to have no representation in the dataset were two management-intense alternatives: (1) 1 yr of burning following by 1 yr of grazing and then 1 yr of burning (BGB) and a within-year combination of burning and grazing followed by 1 yr of rest and then 1 yr of burning (CRB).

AM model results
The AM model provided results at two scales. The first set of results were the state-based management recommendations for each management unit monitored (Appendix S3: Table S3). After six  intervals of model updates, the recommendations varied in intensity of management depending on the condition of the unit. Recommended actions for sites with states composed of >75% native cover (states 15-20) included more burning than grazing, but >75% of the management year recommendations were rest. States composed of <25% native cover (states 1-4) had more recommendations for grazing than burning, but 75% of the management year recommendations were also rest. Resting for all 3 yr was recommended for eight of the 20 states, burning once in 3 yr was recommended for six states, grazing once in 3 yr was recommended for four states, grazing twice in three years was recommended for one state, and a combination of burning and grazing was only recommended for one state. In general, grazing was more often recommended for sites in lower quality states and burning was more often recommended for sites in higher quality states. The only recommendation for a non-rest management action to happen more than once in the 3-yr cycle was the recommendation of graze twice and rest once for state 7. Management unit recommendations were shared annually with managers to inform decisions about which actions to take to improve the condition of the prairie.
The second set of results was the overall learning for each state about which actions and which frequency of action improve the condition of the prairie. At the beginning of the project, we assigned equal weight to each of the four management alternatives and each of the two levels of frequency of action, and as data were added to the model, the support for the different alternatives began to diverge. Support for the management alternatives has taken three different directions. We used states 7 and 12 (Table 1) to illustrate the two most common patterns. State 7 showed less support for rest as the best alternative ( Fig. 2A); however, 12 states showed more support for rest as the best alternative (e.g., state 17; Fig. 2B). Seven of the states had little to no divergence from the equal weightings (Appendix S3: Fig. S2). However, divergence among the non-rest alternatives was low for all states. The trend was similar for the frequency of management results. States 6 and 7 showed more support for the high-frequency management category (e.g., state 7; Fig. 3A), and 14 states showed the most support for the low-frequency management category (e.g., state 17; Fig. 3B). Four states showed little to no divergence among the high and low frequency of management categories (Appendix S3: Fig. S3).

Generalized linear mixed model results
Across all model configurations, we found that the strongest predictor of whether the percent native cover management goal was met was the initial state of the transect (Fig. 4). For importance of burning and grazing, management alternatives depended on the model configuration and the presence of other predictors in the model. Transects with a higher proportion of years grazed tended to have a negative association with the probability of meeting the native cover goal, whereas transects with a higher proportion of years burned tended to have a positive association with meeting the native cover goal. Collapsing the burning and grazing predictors into a single variable was not a strong predictor and did not improve model fit over considering the predictors separately.
The best-fitting model to predict meeting the percent native cover goal included the proportion of years grazed, the proportion of years burned, the initial state, and the interaction between the initial state and the proportion of years burned (Tables 4, 5). The second-best model was nearly equivalent and included the same predictor variables except the interaction term was the interaction between the initial state and the proportion of years grazed. The interaction between the initial state and the proportion of years burned was such that when the initial state was low, the transect was predicted to have a stronger positive response to burning (Appendix S3: Fig. S4), whereas the interaction between initial state and proportion of years grazed showed a similar pattern, but in the opposite direction (Appendix S3: Fig. S5).
Similar to the models with the percent native cover goal as a response variable, we found that the initial state was an important predictor of meeting the proportion of native indicators goal across all models compared. The top six best-fitting models were all similar ( Table 6). The bestfit model included the proportion of years grazed, the proportion of years burned, and initial state (Table 7).
Once again, we found that for all models that included the initial state of the transect as a predictor, there was a positive association between that predictor and meeting the proportion of native indicators goal (Fig. 5). Increasing the proportion of years grazed tended to be associated with a lower probability of meeting the native indicator goal, whereas increasing the proportion of years burned tended to have a higher probability of meeting the native indicator goal. The coefficient on the interaction terms between the initial state and either proportion of years grazed or proportion of years burned was not significant (P > 0.05) in any of the models.

Condition of the prairie results
The overall mean score across all transects was essentially unchanged between the start of the monitoring period and the end of the monitoring period. Mean overall scores went from 55.0 at the start to 52.9 at the end (Fig. 6A). The standard error overlapped, indicating they were not statistically different (Fig. 6A). Of the three individual components, only the functional group (shrub) score indicated a change over time with a score of 24.4 at the start compared to 21.1 at the end.
A different picture emerges when looking at transects based on their starting state. Transects that started in a lower quality (states 1-10) had a substantially higher overall score by the end of the monitoring period (28.8 at the start; 36.8 at the end; Fig. 6B). Scores of the individual components reveal that most of the increase in the overall score can be attributed to an increase in the percent native cover score. Transects that started out low quality had less cover of invasive plants at the end.
Conversely, higher quality transects at the start (states 11-20) had a substantially lower overall score by the end (67.9 at the start; 60.8 at the end; Fig. 6C). This decrease was explained by a decrease in the functional group (shrub) score (lower shrub scores indicate greater cover of shrub vegetation), indicating that high-quality transects experienced woody encroachment over time.

DISCUSSION
Over the 9-yr time frame for this AM project, we documented changes in the condition of the prairie, with clear improvement with the use of prescribed fire and uncertain outcomes for grazing. We were also reminded that change in the condition of the prairie is slow. Prairie ecosystems develop over thousands of years and are composed of predominately long-lived perennial plants (Weaver 1954). The changes in condition documented over the 9-yr period were relatively small. It will take much longer to dramatically change the system. Furthermore, the starting condition of the prairie had a clear effect on the management outcomes, and this supports the use of the state-based model for the AM framework.
Even though the initial state of the prairie had the greatest effect on the final state in the analyses, the increase in cover of native vegetation in low-quality states and the interaction of the initial state and burning suggests that management makes a difference. Burning showed greater improvement of condition on the lower quality sites. At the other end of the quality spectrum, the maintenance of a high cover of native vegetation in high-quality states lends support to the idea that diverse ecosystems may have greater ability to resist invasion from exotic species (Knops et al. 1999, Biondini 2007, Norland et al. 2015. In fact, six of our sites in high-quality states experienced no disturbance over the 9-yr interval and maintained that high-quality state. However, in tallgrass prairie on the edge of the eastern deciduous forests, a low frequency of disturbance, in particular fire, may also explain the increase in woody vegetation (Bond et al. 2005). Managers often focus on high-quality sites because they represent the best of the best, but these results suggest that management focused on low-quality sites may result in more gains. Both the AM model and the generalized linear model support the benefits of fire in the northern tallgrass prairie but disagree on the use of grazing. Fire effects in central and Note: Graze is the proportion of years grazed, burn is the proportion of years burned, and burn or graze is the proportion of years burned or grazed.  Note: Graze is the proportion of years grazed, burn is the proportion of years burned, and burn or graze is the proportion of years burned or grazed. southern tallgrass prairie have been relatively well documented (Collins 1990), and numerous benefits to native vegetation have been demonstrated (Bragg 1995, Engle and Bidwell 2001, Kral et al. 2018. Studies have also shown that appropriate use of grazing can be an important disturbance that improves prairie quality (Briske et al. 2011, Delaney et al. 2016. The predicted relationship between whether the proportion of native indicators goal was met (0 = not met; 1 = met) and (A) the proportion of years grazed, (B) the proportion of years burned, and (C) the initial state of a transect. The solid black line shows the modeled response curve. The gray ribbon shows the upper and lower bounds on the fitted curve. The dots show the predictions from the model. The bars along the top and bottom show the density of data points that meet the management goals. et al. 2016). However, the ways that these management actions are employed can affect the outcomes Bidwell 2001, Briske et al. 2011). The parameters around season and intensity of burn or stocking rate and rotational practices are not included in these analyses, and we know these variables, among others, affect outcomes (Lwiwski et al. 2015, Russell et al. 2015, Kral et al. 2018. For example, rotational grazing with a high stocking rate has a different effect on vegetation than season-long grazing with a low stocking rate (Briske et al. 2011). We were unable to include differences in grazing practices into the AM model itself because of the other frequency and management options being evaluated, but we recognize the importance of these factors. To evaluate the disagreement around grazing between the models, we need to explore some of the specific grazing variables, such as timing and intensity, and perhaps evaluate whether it is possible to add to an updated version of the model. The two statistical approaches were complementary and provided more comprehensive information than either one used alone. Adaptive management provides information that can be acted upon by managers annually through sitespecific recommendations, and as data are added to the model, the site-specific recommendations will become better tailored to the sites in the region. However, with AM it is more difficult to generalize about the overall effects of management on the condition of the prairie, and the designated time interval for model analysis could affect results. The linear model provides more information about general outcomes, incorporates site information across a longer time interval, but is less operational and is not tailored to the conditions at any specific site. Managers using the results from the linear model would have to make decisions about how the general findings would be best applied at any specific site.
Two factors limit generalization about the effect of management actions from AM projects. First, participating sites are not chosen randomly or stratified across any variable, such as starting condition, and second, the management actions themselves are not applied in an experimental or balanced design. Site selection by managers can be biased either by choosing sites closer to the headquarters or by starting condition, and in this study, there was a bias in management application across states. Sites in high-quality states were less likely to be grazed and more likely to be burned. Grazing by cattle has not always been perceived as having a positive effect in northern tallgrass prairies, and the fencing infrastructure necessary to implement grazing at many of the high-quality sites has been removed. This leaves fewer transects to evaluate the effect of grazing on sites with high-quality initial conditions, leaving that aspect of the parameter space minimally tested. Additionally, fewer grazing actions on higher quality sites could explain the lower number of grazing recommendations for these sites in the AM model results. However, regardless of whether recommendations were followed for any of the sites across the 3-yr time step, the monitoring of the outcomes and the actions will still contribute to model learning.
After 9 yr of data collection, it was useful to pair these two approaches to evaluate the overall effects of management and the assumptions about the ecological system responses that were built into the AM model at the outset. The comparison of these approaches was a valuable first step for the next phase of AM, often referred to as double-loop learning (Williams and Brown 2018), to adjust the AM model based on what has been learned. Specifically, we have identified four major issues to address to improve the AM model. The first is the incorporation of the difference in cost between management actions. Managers are interested in knowing the cost-benefit trade-off between money spent and ecological improvement gained. The second is the lack of separation among management actions. The results of the linear model can potentially be used to improve the AM assumptions or model structure to improve the resolution of the results. The third is evaluating whether other stochastic or environmental variables might be affecting the outcomes observed. Capturing the known interactions of burning and grazing with variables like precipitation may improve model recommendations for managers (Derner et al. 2018, Yurkonis et al. 2019. Finally, improving the reporting mechanism of the AM model to provide managers with a range of management options would be beneficial. One of the shortcomings of AM is that you get one recommendation even if the top recommendation is not much different from alternatives, and a menu of alternatives and their support would likely improve the use and support for the AM framework. There are many challenges and opportunities in the use of an AM approach. In general, the AM approach is intended to incorporate learning directly into the decision-making process that all land managers go through every year and to help them think differently about their decisions. The information from the AM model provides them an opportunity to make state-based management decisions rather than following an arbitrary disturbance interval, for example, every five years, for burning or grazing. More informed decision making, however, requires documenting the starting condition and change over time via monitoring. This emphasis on monitoring to inform decisions and outcomes is a shift in practice and an investment of time and money. The U.S. Fish and Wildlife Service has recognized the benefits of this approach and has begun to implement AM to address a range of different management problems (Johnson et al. 2011, Tyre et al. 2011, Moore et al. 2011a, Gannon et al., 2013, O'Donnell et al. 2017). An additional benefit of the AM approach is the shared learning that occurs from so many participants. The monitoring and management information acquired across all participants is much greater than any individual could accomplish alone. Nevertheless, conveying the basic principles of AM to many cooperators and sustaining buy-in over long time frames, given staff turnover at all levels, is challenging. Continued engagement through the annual reporting that is part of the AM framework is an ongoing reminder that we continue to learn and that the managers' engagement and support for both active management and monitoring is necessary. The annual reporting also reminds managers that they are part of a collaborative network, where learning together is essentially the only way to learn.
The AM framework relaxes the requirements of a strictly experimental approach and does not require managers to randomly assign treatments to plots, nor did we require that they implement the recommendations. To restrict managers in either of these ways is not feasible, especially in a project of this scale. Managers are ultimately responsible for the outcome of their management (or lack of it), and the long-term, randomly assigned, idle controls necessary with a more traditional experimentation approach are generally unacceptable (DeKeyser et al. 2009, Grant et al. 2009). The advantage of this relaxed approach is that we were able to build a very large sample size because managers were willing to engage and collaborate. The downside is that it is difficult to assess what effect relaxing these requirements had on our findings. We believe that the advantage of strong replication outweighs the strict adherence to randomization, but we cannot be sure. Regardless, we do not envision a way to incorporate strict randomization and large sample sizes into a project like ours, given the primary missions of most conservation and management agencies.
We have successfully implemented a large-scale AM project across multiple agencies for nearly a decade. We attribute this success to five key lessons learned. Kicking the project off with an inperson workshop where participants were fully engaged was essential to starting the multi-agency effort and ensuring overlapping goals and objectives for the project given different organizational mandates (Reynolds et al. 2016). Establishing a standardized protocol and a centralized database that handled data entry, model updating, and reporting efficiently was especially important for this project with multiple U.S. states, administrative units, and agencies involved (Hunt et al. 2015(Hunt et al. , 2016. Engaging a multi-disciplinary project team is important. The input and guidance from AM experts, modelers, and database developers is essential, while the management agency staff keep the discussions focused on key management uncertainties and feasible solutions, given logistical, budgetary, and staffing conditions. Maintaining coordination, staffing, and institutional memory for a long-term project, given budget changes and uncertainties, is also important for success. Clear, written documentation of both the administrative and scientific aspects of the project helped maintain continuity, standardize methods, and streamline training of new staff. Over the life of the project, we have had turnover in staff in all the participating agencies and at multiple field stations, but the primary coordinators from the three agencies have been stable. Having a lead champion from each organization is important. Finally, regular communication among the primary project proponents is crucial with the annual updating of an AM framework. This ensures that monitoring, model updating, and reporting stay on track. The challenges presented by using an AM framework are great but not insurmountable, and the opportunities for learning, decision support for management, and improved condition of our natural resource are worth it.

ACKNOWLEDGMENTS
Funding was provided by the U.S. Fish and Wildlife Service, the Minnesota Department of Natural Resources, and the Cox Family Fund for Science and Research. We thank C. Moore, P. Heglund, F. Harris, R. Dana, and M. Cornett for their contributions to the project. We thank the numerous land managers who contributed input and data to this project. Mention of trade names or commercial products does not constitute endorsement or recommendation for use by the U.S. Government. The findings and conclusions in this article are those of the authors and do not necessarily represent the views of the U.S. Fish