Accelerated seed dispersal along linear disturbances in the Canadian oil sands region

Habitat fragmentation is typically seen as inhibiting movement via erosion in connectivity, although some patterns of early-phase disturbance, such as narrow linear disturbances in otherwise undisturbed forests, may actually facilitate the dispersal of certain species. Such features are common in Alberta’s oil sands region as legacies from seismic hydrocarbon exploration used to map oil reserves. Many of the ecological implications of these disturbances are unknown. Here, we investigate the effect of these forest dissections by experimentally testing dispersal patterns along seismic lines compared with adjacent forests using two proxy materials for wind-dispersed seeds, Typha latifolia seed and goose down feathers. We found that wind speeds were up to seven times higher and 95th percentile seed dispersal distances nearly four times farther on seismic lines compared with undisturbed forests and the corresponding effect of these features on seed dispersal distances can be substantial, potentially facilitating future changes in composition and ecological processes in boreal forests. This raises important considerations for native and invasive species, particularly in the context of climate change and the associated importance of seed movement and migration.

Note that seed type and line width are categorical variables, upon which the correlation with other predictors depends on the order of categories. The order of seed type (goose down vs. Typha latifolia) is inconsequential as it contains only two levels. Here, we have ordered the line width categories (1 = control, 2 = wide, 3 = narrow) in order of increasing wind speed (Figure 2a) to determine the maximum potential covariance for model variable selection. Note the high collinearity (r = 0.81) between wind speed and line width. A correlation plot is also provided in Figure S2. Seed (Figure 3a) to determine the maximum potential covariance for model variable selection. Significance codes: *** p < 0.001; ** p < 0.01; * p < 0.05. A table of correlations is also provided in Table S2.    3,4 . We note that this approach is not driven by a lack of thinking about the problem of interest: our full model (Step 1) has been defined based on clear a-priori hypotheses (see main text of the paper as well as below for our rationale in including single terms and some of their interactions). Rather, our goal is to make sure that alternative model selection approaches, used to reduce the complexity of our starting a-priori model, lead to the same conclusion, thus increasing confidence in our results for readers preferring one approach over another 5 .
Briefly, the process involves: Step 1 Fit the full model, including all variables and interactions identified in a-priori hypotheses.
Step 2 Perform first model selection routine using a manual backward stepwise variable selection.
Step 3 Dredge the full model to identify the best models based on AIC.
Step 4 Choose a final model structure based on the stepwise and dredging approaches.
Step 5 Fit the final model.
In the preliminary full model, explanatory variables (fixed effects) include: LineWidth Categorical (3 levels), differentiating between (1) forested control sites, (2) wide seismic lines, and (3) narrow seismic lines. Note that categories were ordered such that they identify the maximum correlation with measured wind speed, which we now drop from the models due to high covariance (Table S3, Figure S3).

Time_hr
Time (in hours) between experimental seed release and seed observation and collection. We consider the linear and quadratic form of this variable.
We consider the following interactions in the full model: LineWidth × Time_hr Testing whether the effect of the line width (collinear with wind speed) is consistent across observation time intervals.
LineWidth × SeedType Testing whether the effect of line width (collinear with wind speed) is consistent for both goose down and Typha latifolia seed.
Finally, we include as a random effect in all models:

Site
The unique site number of each paired control and seismic test location The resulting full model took the structure:

Dispersal Distance ~ Line Width + Seed Type + Time + Time² + Line Width × Time + Line Width × Seed Type + (1|Site)
Models were fitted using the lmer command from the lme4 package for R 6

Step 2 -Backward stepwise variable selection
Individual explanatory variables were removed from the full model one at a time, then the model was rerun. The variable with the smallest effect on the response (i.e. the highest p-value) was removed at each iteration. Model selection was considered complete when all remaining variables were significant at p < 0.05. For each iteration (Run), we report the AIC, the variance explained by the fixed effects (r 2 m ), and the variables with the highest p-value in the model output (Highest p), with the p-value in brackets. We then use a cutoff of delta AIC < 4 to select the top models based on AIC 8,9 . These models can be averaged, providing averaged parameter estimates, weighted by AIC. Step 1 -Fit the full model

Step 2 -Backward stepwise variable selection
Individual explanatory variables were removed from the full model one at a time and the model re-run.
The variable with the smallest effect on the response (i.e. the highest p-value) was removed at each iteration. Model selection was considered complete when all remaining variables were significant at p < 0.05. For each iteration (Run), we report the AIC, the variance explained by the fixed effects (r 2 m ), and the variables with the highest p-value in the model output (Highest p), with the p-value in brackets. We then use a cutoff of delta AIC < 4 to select the top models based on AIC. These models can be averaged, providing averaged parameter estimates, weighted by AIC.

Construction of the final models
Step 4 -Choose a final model structure The stepwise AIC and the dredging result in very similar model selections. As well, the selection process for p50 and p95 also produce similar results. In the stepwise AIC for p50 and p95, both time variables are dropped from the model, as is SeedType from the p50 model, while they are all retained in the top dredged models. Given that SeedType is retained in the p95 models and given that both variables (SeedType and Time) test specific a-priori hypotheses, we opted to keep them in the final model.
The final model took the structure:

-Fit the final models
For each response, p50 and p95 distances, we built the final model based on the above model selection and calculate the variance explained by the fixed effects (r 2 m ) and the full model (r 2 c ). We also test the assumptions of the mixed modelling approach, including testing for normality of random effects with a Shapiro-Wilk test (shapiro.test command from the base stats package for R), visually assessing the homogeneity of residuals by plotting model residuals against fitted values using the plot.lme command from the nlme package for R 10 and normality of residuals with quantile-quantile plots using the qqnorm command from the base stats package for R. We assessed the influence of individual data points with plots of Cook's distance and DFBETAS (standardised difference of the betas when individual observations are removed, in units of SE of the parameter estimate) using the influence command from the influence.ME package for R 11 . Acceptable Cook's distances have been suggested as < 1 or < 4/n, where n is the number of units (< 0.33 in our case) and acceptable levels of DFBETAS have been suggested as < 1 or < 2/√n, where n is the number of units (< 0.58 in our case). However, it can be just as effective to examine the values graphically for visual outliers that may be exerting greater influence over the model than other points.  Figure S1, Appendix 1). However, to demonstrate the clear effect of wind speed on dispersal and to demonstrate the model fit with wind variables, we re-ran the LMMs using the measured wind speed instead of the categorical line width. For both p50 and p95 distances, linear and quadratic wind speed variables were significant in the models. In both models, the AIC was higher and variance explained was lower in wind speed models than in their counterpart line width models (Appendix 1).