On the problems of using linear models in ecological manipulation experiments

. Manipulation experiments are often used to investigate ecological and environmental causal relationships and to understand and forecast impacts of anthropogenic pressures on ecosystem function-ing. Such manipulation experiments often use factorial designs, and the data are analyzed using factorial linear models. Factorial designs build on the fundamental assumption that the treatment factors are independent and orthogonal. This assumption is, however, often violated because of variation within and in particular covariation between the performed experimental manipulations. For example, manipulation of temperature and precipitation in factorial setups has been widely applied in climate experiments, but manipulating soil temperature will likely have a strong impact on soil water content. Such dependency among environmental state variables will violate the assumed orthogonality in a factorial linear model and may lead to erroneous conclusions. Here, we demonstrate the importance of the assumption of orthogonality using simulated ecological responses that act on observed soil state variables from a large climate experiment with an apparent orthogonal design. More speci ﬁ cally, we explore the problematic consequences of analyzing ecological treatments as categorical variables in a linear model. Suitable alternative methods for the statistical analysis of manipulated ecological experiments are suggested. The key recommendation is to use the observed effects of the manipulations on the state variables directly in the analysis instead of the categories of treatments. For example, if soil water content and temperature are manipulated, then it is essential to measure the water content and temperature in the soil of all the manipulated plots.


INTRODUCTION
For over a hundred years, the factorial design has been the cornerstone in manipulation experiments in the scientific investigation of causal relationships, and the theory of experimental design has reached a high degree of sophistication in the applied scientific areas of agriculture, medicine, and industry, where the theory mainly was developed (Cox and Reid 2000). Experimental ecology has benefitted from developments in the theory of experimental design and adopted key parts of the developed terminology and concepts, for example, orthogonal factors, block-and split-plot designs, as well as the standard methods of statistical analysis using linear models. Orthogonal experimental designs are effective for investigating the effect of several factors that may interact (Cox and Reid 2000), and the analysis and interpretation of an orthogonal design are relatively simple because the main effect and interaction terms are estimated independently. Therefore, the orthogonal design is widely used when designing ecological experiments. However, some of the fundamental assumptions in the theory may not hold when manipulating ecosystems, and this may have important consequences for the design and analyses of ecosystem experiments.
A general and key property of experimental manipulations, which becomes critical if data subsequently are analyzed using linear models, is unit-treatment additivity. This means that effects of specific treatments are additive and constant for different experimental units, except for random noise (Cox and Reid 2000). The practical implication of this property is that the levels of the different treatment factors should be known without error, that is, without application error, measurement error, or unknown dependency on other manipulated factors. The assumption of unit-treatment additivity is apparent in the design matrix of linear models and is a critical assumption in both parameter estimation and statistical inferences using linear models.
In experimental ecology, these basic assumptions are difficult to meet because of causal dependency among applied factors or because often it is not possible to manipulate the studied factors with sufficient precision, which leads to considerable among-replicate variation for the same treatment. For example, it is difficult to manipulate the in situ soil temperature to a precise soil temperature, and the temperature manipulation will additionally perturb other environmental state variables in the studied ecosystem.
Nevertheless, the data from ecological experiments are often analyzed as if the manipulations were independent and orthogonal. For example, the statistical analysis, performed in the large majority of the more than 100 peer-reviewed scientific publications of our climate experiment, assumed that the manipulated factors were independent and orthogonal, and in our experience, this is a typical situation for many experiments with ecological manipulations. The objectives of the present study were to demonstrate important consequences of variability and inter-correlation between treatment levels when making statistical analyses of climate manipulation experiments with orthogonal designs and to suggest suitable alternative methods for the statistical analyses of such experiments in general.

Case study
We used observational data of soil temperatures and soil water content from the Climaite experiment (Mikkelsen et al. 2008) to exemplify the potential problems and errors in applying orthogonal analyses to observations from ecosystem experiments. In the Climaite experiment ( Fig. 1), atmospheric CO 2 concentration, soil water content, and temperature were manipulated in an orthogonal split-plot design consisting of a total of six blocks. Each block included two octagons, each 6.8 m in diameter, one with ambient (A) and one with elevated CO 2 by free air carbon dioxide enrichment. Within each octagon, drought (D) was applied to half of the octagon by automatic rainout curtains removing the precipitation during the application time. Correspondingly, warming (T) was applied to half of the octagon perpendicular to the drought as passive nighttime warming by automatic roll out curtains applied at night to reflect the infrared radiation from the ground and increasing the temperature in the plots by 0.5-1°C (Arndal et al. 2018). The design created a total of four treatment combinations within each octagon and, thus, eight treatment combinations within each block involving the ambient reference (A) and all combinations of T, D, and CO 2 replicated six times, that is, N = 48.
Within each experimental plot, the volumetric soil water content (0-20 cm soil depths) and soil temperature (5 cm soil depth) were measured continuously at 10-to 30-min intervals and stored at hourly time resolution (for technical details, see Mikkelsen et al. 2008).
A wide range of biological response variables have been investigated in the Climaite experiment, and the results have been reported in >100 peerreviewed scientific papers. The findings range from measurements of responses in plant processes, such as photosynthesis (Albert et al. 2011), the aboveground plant community (Kongstad et al. 2012), and belowground root production (Arndal et al. 2018), to responses in the soil, such as soil respiration (Selsted et al. 2012), nitrogen cycling (Larsen et al. 2011), soil fauna diversity (Holmstrup et al. 2017), and aboveground herbivore fauna (Scherber et al. 2013). The observed effects include significant responses to individual factors or combinations of treatment factors as well as interactions among them or no effects at all.

Data simulations
The strategy chosen in the present study was to simulate a supposedly a priori known ecological process that is affected by the manipulated drivers, that is, temperature and soil water content, and then afterward to analyze the simulated response in an orthogonal mixed-effect linear model. Since the simulated ecological process is known, it is possible to compare the statistical analysis of the simulated ecological responses using an orthogonal mixed-effect linear model with the true, known ecological process. For example, if we know that the ecological process is independent of a specific driver and the orthogonal mixed-effect linear model shows a relationship with the driver, then we may conclude that the statistical analysis is flawed.
In the simulations, it is assumed that the ecological responses are only affected by the manipulated drivers, that is, the manipulated soil state variables. Following this, we simulated the ecological responses to the measured manipulated soil drivers (soil water content and temperature). The simulated ecological responses were then subsequently analyzed, as if we only had information of the orthogonal treatments.
If the measured soil temperature and water content are denoted x 1 and x 2 , respectively, then the simulated ecological response, y, is modeled deterministically without any stochastic error, as where a, b, c, and d are constant parameters (Fig. 2). For example, if soil temperature and water content in a given plot were measured to be 2 and 3, respectively (x 1 ¼ 2; x 2 ¼ 3), and ða ; b ; c; dÞ is set to ð2; 1; 3; 4Þ, then the simulated ecological response We made a total of six different response models in which we let the parameters, a, b, and c, vary with different combinations of the values À1, 0, 1, or 10. For simplicity, all simulated ecological response models were set up to be independent of atmospheric CO 2 concentrations.

Statistical analysis
The six different simulated ecological response models were analyzed as if we only had information of the orthogonal treatments (Fig. 2) in a mixed-effect linear model with fixed effects temperature (T), drought (D), and CO 2 and their interactions (T 9 D, T 9 CO 2 , D 9 CO 2 , and T 9 D 9 CO 2 ). The random effect was specified as block/CO 2 /T/D, reflecting the used split-splitplot design of the experiment, and analyzed using the nlme library in R (Pinheiro et al. 2013).

RESULTS AND DISCUSSION
In principle, the three manipulation factors in the study were applied independently and, as mentioned above, a prerequisite for analyzing the responses by linear models is that the factors are independent and additive. However, a number of interactions are inherent, such as the passive nighttime warming technique causing simultaneous reduction in soil moisture due to increased evaporation related to increased temperature, but likely also related to reduced dew formation during nighttime due to the coverage by the reflective curtains used for nighttime passive warming. Although the reduction of soil moisture in warmed plots was small, it was consistent and statistically significant (Holmstrup et al. 2017). For some biological responses, even such minor changes in soil moisture may be important and therefore must be accounted for when trying to unravel the mechanistic interactions between organisms and the environment. In the present example, some of the soil moisture reduction is confounded with the temperature effect when the ecological responses are analyzed in an orthogonal linear model. Such secondary effects of one driver on other drivers are widespread in ecosystem manipulations, for example, (1) if plant density is augmented, then soil water content is expected to decrease, while the local air humidity is expected to increase and, again, will be included as an effect of the temperature and not of moisture, or (2) in elevated CO 2 plots, increased plant water use efficiency may cause soil water content to increase if plant biomass remains constant (Albert et al. 2011).
The was effective, although there were three plots out of 24 with relatively high soil water content, whereas the temperature treatment was less effective with a large overlap among the soil temperatures of the control and the temperature treatment plots (T). The mean and standard deviations of the effect of the treatments on the underlying soil state variables are shown in Table 1. The sample covariance of the measured soil temperatures and water contents was 0.272, which, however, did not differ significantly from the assumption of independence (two-sided t-test [n = 48]: P = 0.23). When viewed together, it is clear from Fig. 3 that the sizable variance in soil temperature and marked covariance among the manipulated soil state variables had the consequence that treatments were neither independent nor orthogonal.
In our synthetic data simulation, when the ecological response was analyzed using the orthogonal mixed-effect linear model, the results deviated distinctly from the known ecological process under which the ecological response data were simulated (Table 2). Particularly, the simulated effect of temperature was only significant when either the effect of D was removed (Table 2, b = 0) or when the effect of T was set ten times higher than the effect of D (Table 2, a = 10, b = À1). In addition to the simulation shown in Tables 2 and 3, we also did multiple simulations where the observed temperature effect caused by the treatment (as shown in Table 1) was gradually increased. We wanted to see at which level model 3 (Table 2, a = 1, b = 1,  ❖ www.esajournals.org c = 0) would come out significantly for the T treatment. The simulations showed that only when the T effect was enhanced by at least 5°C (with SD kept as in Table 1), did the orthogonal mixed-effect linear model show a significant effect of T in more than 95% of the simulations of model 3 (Appendix S1: Fig. S1). This highlights the fact that weaker treatment effects disappear in combination with stronger treatment effects when the experimental effects are not orthogonal. Similarly, the simulated interaction effects between temperature and drought were not significant in the analysis using the orthogonal mixed-effect linear model (Table 2, c 6 ¼ 0). If the simulated data set of the response variable was log-transformed, the same qualitative statistical result was obtained (not shown).
To show the effect of lack of orthogonality, we also ran the orthogonal mixed-effect linear model on the six different simulated responses after forcing the treatments of temperature (T) and drought (D) to be both constant and orthogonal, that is, by using the observed means (Table 1). Now the effect of T remains significant in all cases where a = 1 or 10 and interactions come out significantly when c 6 ¼ 0 ( Table 3).
One of the important features of the orthogonal design is that the main effects and interaction terms are estimated independently. However, this is clearly not the case in the present example, where even large effects of temperature are masked by the drought treatment. Consequently, the statistical analysis using the orthogonal mixed-effect linear model can lead to acceptance of erroneous zero hypotheses.
In general, all estimated P-values will be biased either upwards or downwards if the ecological experiment is analyzed with a linear statistical model if treatments are not independent and orthogonal. For example, even when the Notes: The corresponding P-values are shown in parentheses, where coefficients in bold are significantly different from zero. In line three and five, we a priori know that there is an effect of both soil temperature (a 6 ¼ 0) and water content (b 6 ¼ 0), whereas there is no interaction effect (c = 0), but the statistical analysis suggests that there only is an effect of water content. In the last four columns, the effect of CO 2 and its interaction effects with D and T are shown; note that there are clear patterns of the effect of CO 2 , even though the simulated ecological responses were set a priori to be independent of CO 2 . The a priori expected coefficients, if treatments of temperature (T) and drought (D) were fully orthogonal, are simulated in Table 3.
† Coefficients should have been different from zero according to a priori expectations. Table 3. Parameter settings used in the simulations (a, b, c, d = 1000) and the estimated coefficients of the different factors and their interactions, when treatments of temperature (T) and drought (D) were forced to be fully orthogonal using the mean data of soil temperatures and soil water contents reported in Table 1, and simulated response data were analyzed using the orthogonal statistical design. Notes: The corresponding P-values are shown in parentheses, where coefficients in bold are significantly different from zero. The generated response data were added a small amount of stochastic noise (Normal (0, 0.05)). simulated ecological response was assumed to be independent of CO 2 and without any random noise, all the estimated CO 2 effects were positive (Table 2). Consequently, if there had been a positive effect of CO 2 , then the estimated P-value would have been downward biased relative to the correct P-value and, oppositely, if there had been a negative effect of CO 2 , then the estimated P-value would have been upward biased.

Spatial variation and ecosystem representation
In addition to the above-mentioned situations where interactions and indirect effects of the treatment factors compromise the orthogonality assumptions, situations exist where spatial and temporal variation at a site makes the strict assumptions of orthogonal and independent factors even more difficult. Representative or comparable sampling is always a challenge in field-scale experimentation, but sound conclusions on the effect of a given treatment are critically dependent on comparisons of comparable entities. Lack of comparability among samples may increase the apparent variability and noise and thereby prevent differences among treatments to be recognized. More importantly, in the context of orthogonality a lack of comparability can compromise the assumptions behind the statistical analysis. If, for example, a soil water manipulation treatment leads to uneven spatial distribution of soil moisture and field measurements of soil moisture fail in capturing this aspect of the treatment, the results using a standard treatment level may lead to erroneous results. Many physical and biological characteristics of ecosystems, such as soil type, soil stratification, vegetation patterns, and microtopography, may directly interact with key climate treatments, for example, temperature and water.
Another challenging example is the spatial scale of variability in vegetation. In some vegetation types, for example, heathland vegetation, the scale of the spatial variation is in meters, which is comparable to the plot size of many manipulated ecological experiments (Fig. 1), and in some vegetation types like forests, the spatial variation even exceeds the plot size. Consequently, if vegetation type is not included as a variable in the analysis of such experiments, then possible effects of the vegetation may be confounded by effects at the plot level and lead to erroneous conclusions. Distance is not the best indicator of the spatial variation in spatially aggregated vegetation with repetitive patterns, and the notion of blocking may need to be reconsidered in manipulated ecological experiments. It is therefore recommended to critically assess the characteristics of possible spatial variation within the experimental area.

Conclusions-recommendations and alternatives
We recommend measuring the effect of the manipulations directly for all the manipulated state variables with high precision and accuracy, and if the assumptions of unit-treatment additivity and orthogonality cannot be upheld, then analyzing the experimental data using the underlying variation of the manipulated state variables as the explaining factor for the observed ecological response. To our knowledge, there is no formal test of orthogonality, and instead, we recommend to visually inspect plots of the manipulated state variables (like Fig. 3) in order to determine whether the assumption of orthogonality can be upheld.
If the conclusion is that the assumption of orthogonality cannot be upheld, then the statistical analysis can be done using simple regression models or more advanced process-based models. Furthermore, the full characterization of all the manipulated state variables also allows a more flexible experimental design with an augmented number of replicates at typical levels of the state variables or where threshold effects of the ecological response are expected.
For some ecological responses, it may be relevant to analyze time series data rather than relying on static data in a problematic spatial blocking design. Using time series data, it is possible to analyze the change of the measured ecological response variable, for example, population growth, instead of measuring the absolute level of the ecological response variable, for example, population size. This has the advantage that changes in state variables are expected to be less sensitive to large internal variability in plot characteristic features and histories compared to the absolute values of the states. Consequently, if the plot size is relatively large compared to the observed ecological response, then it is a reasonable assumption to analyze the change in ecological responses in each plot as if they were independent experimental units. This assumption of independence may be tested by plotting the residual variation of the time series.
Another promising method for analyzing manipulated ecological experiments is structural equation models (SEM) in which causal relationships with a high degree of certainty can be specified from the manipulations to the underlying variation of the ecosystem, and where the ecological response is modeled as causal relationships from the underlying variation of the ecosystem (as outlined in Fig. 2). Structural equation models allows testing for the effect of the manipulations as well as possible interdependence among the different manipulations on the underlying variation of the ecosystem. More importantly, SEM does not assume orthogonality of the manipulations, and since the underlying variation of the manipulated ecosystem is measured directly, the assumption of unit-treatment additivity is met. For example, in a recent study SEM was used to model the effect of soil physical conditions (temperature and moisture) on the biodiversity of soil fauna (Holmstrup et al. 2017). By using a traditional statistical ANOVA model including all treatments and their interactions, the authors did not find significant effects of any of the three treatments (temperature, drought, or CO 2 ) on biodiversity, which was a counterintuitive result because soil moisture is a critical environmental factor for abundance and function of many soil invertebrate taxa (Rapoport and Tschapek 1967). The reason for lack of significant drought effects may be related to the confounding effects of temperature and CO 2 on soil moisture, but may also be related to high variability of the particular field site and limited statistical power (n = 6). Using SEM gave more degrees of freedom to test the general hypothesis that soil moisture is, indeed, important for many species of soil animals and that this factor expectedly would be reflected in the biodiversity. The SEM showed that there was a significant relationship between soil moisture and the biodiversity of certain functional groups, but also that elevated atmospheric CO 2 had indirect effects on biodiversity through increased litter C:N ratio. The add-on effect of using SEM was, therefore, that more mechanistic understanding may be revealed as compared to traditional factorial analysis.
In conclusion, the common current practice of analyzing ecological experiments with categorical treatment variables in linear models, with the implicit assumption of independence and orthogonality, will, in our opinion, most likely lead to erroneous conclusions. Instead, we recommend applying more regression type statistical models, such as SEM, in the analysis of manipulated ecological experiments.

ACKNOWLEDGMENTS
The CLIMAITE project (CLIMate change effects on biological processes In Terrestrial Ecosystems) was funded by the Villum Kann Rasmussen Foundation and further supported by Air Liquide Denmark A/S, DONG Energy and the participating institutions.