Ideas and perspectives: Beyond model evaluation – combining experiments and models to advance terrestrial ecosystem science

. Ecosystem manipulative experiments are a powerful tool to understand terrestrial ecosystem responses to global change because they measure real responses in real ecosystems and yield insights into causal relationships. However, their scope is limited in space and time due to cost and labour intensity. This makes generalising results from such experiments difﬁcult, which creates a conceptual gap between local-scale process understanding and global-scale future predictions. Recent efforts have seen results from such experiments used in combination with dynamic global vegetation models, most commonly to evaluate model predictions under global change drivers. However, there is much more potential in combining models and experiments. Here, we discuss the value and potential of a workﬂow for using ecosystem experiments together with process-based models to enhance the potential of both. We suggest that models can be used prior to the start of an experiment to generate hypotheses, identify data needs, and in general guide experimental design. Models, when adequately constrained with observations, can also predict variables which are difﬁcult to measure frequently or at all, and together with the data they can provide a more complete picture of ecosystem states. Finally, models can be used to help generalise the experimental results in space and time, by providing a framework in which process understanding derived from site-level experiments can be incorporated. We also discuss the potential for using manipulative experiments together with models in formalised model–data integration frameworks for parameter estimation and model selection, a path made possible by the increasing number of ecosystem experiments and diverse observation streams. The ideas presented here can provide a roadmap to future experiment–model studies

is often difficult to (explicitly) generate hypotheses that reflect multiple known individual processes and their (unknown) interactive effects. Using a model, or multiple models, or multiple process representations within one modelling framework can aid the hypothesis generation. Some experiments on shorter-lived, herbaceous species have shown shifts in species composition in response to elevated CO2 (Reich et al., 2018), similar changes in longer-lived organisms are harder to observe in EMEs. These types of responses and their lack of representation in ecosystem models imply limits for data-constrained and model-based temporal up-scaling. In terms of short-term plasticity or acclimation, recent advances in using eco-evolutionary optimality in models (Harrison et al., 190 2021) can represent plastic plant responses, and have successfully been used together with EMEs (Caldararu et al., 2020;Sabot et al., 2022). However, current ecosystem models are not well suited for dealing with changes in species composition, demography, and competition processes. However, trait-based, individual or cohort-based models (Fisher et al., 2015).
Particularly for evaluating simulated acclimation and other phenotypic plasticity that operate at time scales of months to years, it will be important to make targeted use of insights gained from EMEs. Since this is a knowledge gap in both EMEs and models, it is an opportunity for both communities to work side by side, rather than sequentially, to advance our knowledge of plant adaptation.
In recent years, there has been a global effort to overcome the limitations of the different approaches, and try to solve the need of standardised controlled experiments on wide temporal and spatial scales. can drastically facilitate model-data integration studies, performed for an extended set of sites and experiments. Multiexperiment modelling may be essential for powerful generalisability tests and uncertainty quantification of model predictions, e.g., by a leave-experiment-out cross-validation (see also Section 3). Thus, using experiment networks and compilations 210 together with models would increase confidence in the generality of model predictions and partially deal with the upscaling issues discussed in Section 3.3. The use of CDE results could also help to improve the parametrization of process-based models in ecosystems that are underrepresented, but this would require better coordination between response variables measured in the field and those processes included in the models (N. G. Smith et al., 2014).

Model-data integration 215
Data assimilation (DA) and model data integration (MDI) are broad umbrella terms for a variety of statistical methods that fit process based models parameters to observations. The methods are well established and widely used with remote sensing One of the main issues with using DA methods is that the data used to constrain models are observations in present or past conditions, raising questions about the capacity of resulting models to predict ecosystem responses under future conditions. Therefore, using data from manipulative experiments can be extremely valuable in providing information of yet unobserved conditions. One other common issue with using remote sensing data to parameterise models, is that the observations used in DA need to be variables that are actually represented in models (MacBean et al., 2022), so that most remote sensing indices 225 need to be processed further before they can be used. In contrast, experimental observations provide information that can easily be mapped to model variables -biomass, ecosystem fluxes, soil pools, etc. MDI provides a formalised approach to make best use of naturally sparse EME observations and combine them with a priori understanding embodied in model structures. Integrating diverse ecosystem data can help estimating the system state, given physical constraints that are built into the model (e.g., mass conservation) (Jiang et al., 2020). MDI can also provide an 230 approach to formalised model selection (Mark et al., 2018) and a systematic treatment for trading off model complexity and fit to the data.. EMEs, in contrast with observational data commonly used in MDI studies, often have a particularly strong leverage in discriminating between predictions of alternative models, which other data types often lack. Only if an ecosystem's slow biogeochemical cycling is "hit hard" in an experimental setup, underlying processes are revealed. Thus, EMEs provide key information that is required for robust model selection -the discrimination of alternatively formulated model structures 235 that reflect alternative hypotheses of how ecosystem processes work.
However, typical models used for global biogeochemical cycle and climate change impact simulations are often complex and contain a large number of weakly constrained parameters. In view of the sparsity of EME data, this poses a risk of overfitting.
Over-fitting may be mitigated using a "leave-experiment-out cross-validation" approach where one experiment is 240 systematically left out of the model fitting procedure and used as an out-of-sample test. This may be a way to handle the overfitting risk, enable a more robust calibration of model parameters, and provide a more reliable estimate of the spatial generalisation error. However, the environmental space currently covered by EMEs and their available data is limited and gaps remain particularly in the tropics and for CO2 experiments in all biomes except temperate forests and grasslands (Van Sundert et al., 2023). 245

New data for EME-model synthesis
Despite the potential for model-informed ecosystem experiments discussed above, data availability still defines model use because without data, models are impossible to constrain. Advances in measurement techniques and data processing offer an opportunity to increase the types and frequency of measurement that can be gathered at EMEs. EMEs are one of the best locations to develop new data streams because they are typically already well parameterised, offering established data in 250 addition to that from novel sources and potentially benefiting from frequent site visits necessary for non-standard instrument development. Thus in addition to the direct feedback with individual EMEs, models can set the agenda for data development through transparent discussion of the uncertainty and parameter sensitivity.
Perhaps the most straightforward, EME experiments can be equipped with proximal sensing devices such as phenocams (Brown et al., 2016) or sun induced fluorescence (SIF) sensors (Yang et al., 2015). These provide continuous regular 255 measurements of (generally) canopy properties. While these types of measurements are available from spaceborne sensors, local measurements provide more spatial detail and often the possibility of measuring each individual plot separately, thus identifying treatment effects in smaller scale EMEs. Unlike spaceborne instruments, proximal sensors usually require calibrating sensors between different treatments but on the other hand provide regular, location specific information. Such