Biodiversity-Based Empirical Critical Loads of Nitrogen Deposition in the Athabasca Oil Sands Region

: Anthropogenic nitrogen (N) emissions can have considerable effects on terrestrial ecosystems, with chronic N deposition leading to changes in plant species composition. The Athabasca Oil Sands Region (AOSR) represents a large point source of N emissions, which has prompted concern for surrounding habitats. The objective of this study was to determine the relative importance of N deposition as a driver of plant species community composition against bioclimatic and soil chemical variables. Further, we sought to identify community thresholds in plant species composition across a N deposition gradient. This assessment was performed for 46 Jack pine ( Pinus banksiana Lamb.)-dominant forest sites surrounding the AOSR spanning Alberta and Saskatchewan. In total, 35 environmental variables were evaluated using redundancy analysis (RDA), followed by gradient forest analysis applied to plant species abundance data. Soil chemical variables accounted for just over 26% of the total explainable variation in the dataset, followed by bioclimatic variables (19%) and deposition variables (5%), but joint effects between variables also explained a significant portion of the total variation ( p < 0.001). Total deposited nitrogen (TDN), and sulphur (TDS) along with bioclimatic and soil chemical variables, were identified as important variables in gradient forest analysis. A single, definitive threshold across TDN was identified at approximately 5.6 kg N ha − 1 yr − 1 (while a TDS threshold was found at 14.4 kg S ha − 1 yr − 1 ). The TDN threshold range was associated primarily with changepoints for several vascular species ( Pyrola asarifolia , Pyrola chlorantha , Cornus canadensis , and Arctostaphylos uva-ursi ) and bryophyte and lichen species ( Pleurozium schreberi , Vulpicida pinastri , and Dicranum polysetum ). These results suggest that across Jack pine-dominant forests surrounding the AOSR, the biodiversity-based empirical critical load of nutrient N is 5.6 kg N ha − 1 yr − 1 .


Introduction
It is well established that elevated nitrogen (N) deposition from anthropogenic sources impacts plant species diversity [1][2][3][4][5][6]. Even at low levels of N deposition, there can be considerable effects on terrestrial ecosystems [7] as communities begin to shift, favouring nutrient-rich, nitrophilous species while simultaneously reducing overall species richness [8]. Changes in species composition and biodiversity are the most evident impacts of chronic N deposition on an ecosystem [9][10][11].
In general, transportation, industry, and agriculture are the dominant sources of N emissions, and regions not immediately surrounding a point source may also be subjected to chronic N deposition owing to long-range atmospheric transport [12]. In Canada, more than 60% of all national nitrogen oxide (NO x ) emissions originated from Alberta, Saskatchewan, and British Columbia in 2017. Furthermore, Alberta and Saskatchewan were responsible for 49% of all national ammonia (NH 3 ) emissions [13]. Large-scale, open-face mining of bitumen in the Athabasca Oil Sands Region (AOSR) of Alberta represents a significant point source of N emissions [14]. As such, there is growing concern that the AOSR may be impacted by chronic N deposition [15].
Critical loads have been extensively used to underpin emissions reductions for both sulphur (S) and N [16] to preserve ecosystem structure and function [12,17]. Critical loads Nitrogen 2023, 4 170 are defined as a quantitative estimate of exposure to one or more pollutants below which no significant adverse effects can be seen on elements of the environment [10] and have been determined through empirical, mass balance, and dynamic modelling approaches [16,18]. The empirical critical loads (CL emp N) approach is based on measurable and observable responses in the environment [6,19]. In Europe, critical loads of N for various ecosystems and specific indicators of exceedance have been established [6,19,20]. For coniferous forests, CL emp N was estimated between 3-15 kg N ha −1 yr −1 , with exceedance indicated mainly by changes in soil processes, nutrient imbalances, and the changing composition of mycorrhiza and other ground vegetation [6,19]. However, forests across North America, particularly Canada, have historically experienced lower levels of N deposition compared with Europe. As a result, CL emp N for these regions are often refined to protect sensitive ecosystems that have not yet reached exceedance [21]. For instance, the recommended CL emp N for northeastern US forested ecosystems is 8 kg N ha −1 yr −1 , with a specific CL emp N for herbaceous plants and shrubs falling between 7 and 21 kg N ha −1 yr −1 , while that of lichens is between 1 and 9 kg N ha −1 yr −1 [21].
Several studies have found that the addition of site-specific soil chemical variables can provide a more robust assessment of the effects of N deposition concerning plant species' community composition [9,22]. In one study, a temporal analysis evaluating species composition under several environmental variables found that soil acidity was the most important variable for community composition [23]. In a separate study, results from 23 N addition experiments across North America suggested that a site was most susceptible to reduced species richness when certain soil parameters and bioclimatic variables aligned, specifically low cation exchange capacity and low temperatures [24]. Other studies have used an N gradient approach with site-specific data to assess the effects of N deposition. In a regional scale study, N deposition and soil pH were found to have a significant influence on species richness [25]. Similarly, soil pH was identified as significantly relevant with respect to species richness on an international scale [26]. These studies suggest that using site-specific soil-chemical variables in conjunction with deposition and climate data can provide a more realistic overview of the effects of N deposition.
Across an N gradient, thresholds concerning community composition have also been identified using a variety of statistical analyses. For instance, generalised additive models (GAMs) have been used with survey data (species presence and absence) to identify community thresholds for three different habitats across Great Britain [27]. Regression analysis was used to assess species richness using multiple environmental variables on a small regional scale for a single grassland habitat in Great Britain [25] and on a much larger scale, covering a transect across the Atlantic biogeographic zone of Europe [26]. Threshold Indicator Taxon Analysis (TITAN [28]) has also been applied in several gradient studies to evaluate species composition against a single environmental variable, N deposition [11,29,30]. Gradient-forest analysis is a relatively novel statistical technique using a gradient-based approach known to be more flexible, non-linear, and multivariate [31]. By combining all single-species responses to the environmental gradient, gradient forest analysis is similar to TITAN in that it can identify significant community thresholds. However, gradient forests differ by not only accounting for multiple environmental variables at once [32], but also identifying the relative importance of each variable for changing plant species community composition (changepoints, [31,32]).
The objective of this study was to evaluate the relative importance of N deposition, site-specific soil chemistry, and bioclimatic data as drivers of plant species composition in the ASOR. Plant species abundances data for 46 Jack pine (Pinus banksiana Lamb.)-dominant forest plots across Alberta, Saskatchewan, and the Northwest Territories were evaluated against site-specific soil-chemical variables (n = 40) and bioclimatic and deposition variables (n = 44) modelled on a regional scale. All data were first analysed through principal component analysis (PCA) to reduce the number of predictors by identifying variables that were both highly correlated and explained little to no variance. In addition, an ordination technique (RDA) was used to assess which set of predictors (soil, deposition, or climate) better explained the variation in species composition. All remaining environmental data were then analysed against individual species abundance data using gradient forest analysis to determine the importance of each variable to community composition. Gradient forest analysis was used to identify community thresholds and the species that demonstrated the greatest sensitivity across each gradient. Further, these community thresholds were assumed to be equivalent to biodiversity-based empirical critical loads of nutrient N.

Study Sites and Data Sources
Data from established semi-permanent and permanent forest monitoring plots were collected from three separate surveys across Alberta (AB), Saskatchewan (SK), and the Northwest Territories (NWT). The Wood Buffalo Environmental Association (WBEA [14]) provided data for forest plots in Alberta (n = 25), while the Forest Ecosystem Classification provided data for plots in Saskatchewan (SK-FEC [33]) and the Northwest Territories (NWT-FEC [34]). The 25 plots from the WBEA survey were classified as Jack pine dominant. To maintain data continuity, all plots selected in Saskatchewan and the Northwest Territories were only selected if Jack pine was the dominant forest stand. Sites were classified as Jack pine dominant forests if total canopy coverage was at least 10% or more of a site and, of that, Jack pine trees covered 50% or more of the existing canopy. These sites were mainly located across the boreal shield and boreal plains ecozones and generally characterised by mesotrophic soils, ranging from xeric to mesic moisture levels, poor nutrient status, and soils that ranged from dry, well-drained sandy soils to moist soils. Only sites that were not recently subjected to fire disturbance events were selected, with WBEA and NWT-FEC reporting a minimum of 60 years of post-fire disturbance range in their sampling criteria, while SK-FEC reported at least 40 years of post-fire disturbance for all sites.
Plant species data from all surveys were recorded as cover-abundance estimates per site (surveyed between approximately 2001 and 2011). In addition to abundance data, the WBEA survey included soil chemistry data for their 25 plots. To add to the soil chemistry data, 21 plots across Saskatchewan and the Northwest Territories were selected for soil analysis. In total, 46 sites (n SK-FEC = 15, n NWT-FEC = 6, and N WBEA = 25; Figure 1) were selected for statistical analysis. For additional plot information, see Supplementary Materials SI and SII.

Site Sampling Methods
Under the current study, 21 of the 46 selected study sites were visited and uniformly sampled. Soil sampling was performed at the Saskatchewan sites during August and September 2014 (n SK-FEC = 15), while sites in the Northwest Territories were visited in October 2014 (n NWT-FEC = 6). A 40 m transect was established at the centre of each study site, ensuring that vegetation was consistent across the entire transect. Five pits were established at 10 m intervals along the transect, with pit 1 beginning at 0 m. Mineral topsoil (0-15 cm) samples were collected using a soil auger below the litter layer for all five pits. Bulk-density cores were collected at pits 1, 3, and 5 (0-15 cm depth). A 17.6 cm by 13.6 cm sample of the forest floor was also collected at pits 1, 3, and 5 for each study site. All samples were stored in ziplock bags until further analysis was performed. The 25 plots in Alberta were sampled by WBEA [14]. Forest floor and mineral soil were sampled by WBEA and archived following their analyses. Archive samples were then transferred to Trent University for chemical analysis.

Figure 1.
Location of Jack pine dominant forest plots across Alberta, Saskatchewan, and the Northwest Territories (n = 46). Sites are colour coded by their total deposited nitrogen (TDN in eq ha −1 yr −1 ; red represents the highest TDN and yellow the lowest TDN). The inset depicts the location of the study area in north America (east-west distance is ~475 km).

Site Sampling Methods
Under the current study, 21 of the 46 selected study sites were visited and uniformly sampled. Soil sampling was performed at the Saskatchewan sites during August and September 2014 (nSK-FEC = 15), while sites in the Northwest Territories were visited in October 2014 (nNWT-FEC = 6). A 40 m transect was established at the centre of each study site, ensuring that vegetation was consistent across the entire transect. Five pits were established at 10 m intervals along the transect, with pit 1 beginning at 0 m. Mineral topsoil (0-15 cm) samples were collected using a soil auger below the litter layer for all five pits. Bulk-density cores were collected at pits 1, 3, and 5 (0-15 cm depth). A 17.6 cm by 13.6 cm sample of the forest floor was also collected at pits 1, 3, and 5 for each study site. All samples were stored in ziplock bags until further analysis was performed. The 25 plots in Alberta were sampled by WBEA [14]. Forest floor and mineral soil were sampled by WBEA and archived following their analyses. Archive samples were then transferred to Trent University for chemical analysis.

Laboratory Analysis Methods
Sampling procedures for the WBEA survey collected soils at several different depths including 0-5 cm and 5-15 cm. To increase homogeneity between plots, archived WBEA mineral soil samples were composited for depths 0-5 and 5-15 cm for each site, creating a single depth of 0-15 cm. For samples collected during fieldwork (SK and NWT), all pits for each plot were composited to create a single sample (0-15 cm). Soluble anions (including nitrite and nitrate) were determined by performing water extractions on 4 g of fresh, composite soil samples for the NWT and SK study sites. Samples were then refrigerated, and analysis of the resulting aqueous solution was performed by ion chromatography within 5 days. For all 25 plots in Alberta, WBEA performed identical soluble anion analyses on water extractions using ion chromatography [35].
All samples collected during fieldwork (NWT and SK samples) were air-dried and sieved to <2 mm to remove stones, roots, and other debris before further analysis was performed. Archived WBEA samples were already air-dried and sieved. All samples were Figure 1. Location of Jack pine dominant forest plots across Alberta, Saskatchewan, and the Northwest Territories (n = 46). Sites are colour coded by their total deposited nitrogen (TDN in eq ha −1 yr −1 ; red represents the highest TDN and yellow the lowest TDN). The inset depicts the location of the study area in north America (east-west distance is~475 km).

Laboratory Analysis Methods
Sampling procedures for the WBEA survey collected soils at several different depths including 0-5 cm and 5-15 cm. To increase homogeneity between plots, archived WBEA mineral soil samples were composited for depths 0-5 and 5-15 cm for each site, creating a single depth of 0-15 cm. For samples collected during fieldwork (SK and NWT), all pits for each plot were composited to create a single sample (0-15 cm). Soluble anions (including nitrite and nitrate) were determined by performing water extractions on 4 g of fresh, composite soil samples for the NWT and SK study sites. Samples were then refrigerated, and analysis of the resulting aqueous solution was performed by ion chromatography within 5 days. For all 25 plots in Alberta, WBEA performed identical soluble anion analyses on water extractions using ion chromatography [35].
All samples collected during fieldwork (NWT and SK samples) were air-dried and sieved to <2 mm to remove stones, roots, and other debris before further analysis was performed. Archived WBEA samples were already air-dried and sieved. All samples were weighed before and after being oven-dried at 105 • C for 12 h to calculate their percent moisture. Soil pH was determined for all samples using a pH probe in a 0.01 M CaCl 2 matrix.
Loss on ignition (LOI) was also determined for all samples by ashing air-dried samples in a muffle furnace at 400 • C for 10 h and calculating the mass difference in pre-and post-ignition samples. Using ignited samples, particle size analysis (percent sand, silt, and clay) was determined using a Horiba LA-950 analyser on all samples. Exchangeable cations (EC) were determined on SK and NWT samples using a 1 M ammonium chloride (NH 4 Cl) extraction. Extractions were performed by adding 25 mL of NH 4 Cl to 5 g of air-dried soil in a conical tube. All samples were shaken for 2 h and then left to stand for 1 h. The solution was then filtered through a Whatman 42 filter paper. An additional 25 mL of NH 4 Cl was then added to the conical tube to rinse out any residual soil, and the extracted solution was then analysed for major cation concentrations using an inductively coupled plasma optical emissions spectrometer (ICP-OES). All 25 samples collected by WBEA underwent comparable exchangeable cation analysis with some minor differences. Following the addition of 25 mL of 1 M NH 4 Cl to 2.5 g of 2 mm air-dried mineral soil, rapid extractions were conducted to gather as close to 15 mL of solution as possible into a collection syringe. Approximately 30 mL of NH 4 Cl was then added to the collection tube to rinse any residual soil from the sidewalls, and additional extractions were performed for another 12 h. Cation concentrations were then analysed using an ICP-OES spectrometer [35].
Nitrogen mineralization was determined using a 2 M KCl extraction for all SK and NWT air-dried samples. For ammonium (NH 4 + ) and nitrate (NO 3 − ) extraction, 40 mL of 2 M KCl was added to 4 g of air-dried soil and shaken for 2 h. The solution was then filtered through a Whatman 42 filter paper, and the solution was analysed on a Bran Luebre continuous flow analyser. For WBEA samples, 25 mL of 2 M KCl was added to 2.5 g of air-dried mineral soil in a polypropylene tube and shaken for 1 h. The solution was then decanted and centrifuged at 2000× g for 15 min. The solution was then analysed using colorimetric determination and a segmented flow analyser complete with a dialysis membrane to prevent interference from solids or coloured co-extractives [35].
Soil samples were pulverised for 3 min using a SPEX Miller Mill 6000D and analysed for percent carbon (C), nitrogen (N), and sulphur (S) using an Elementar Vario Max analyser. Bulk density was also determined for SK and NWT samples using soil cores. Soil cores were weighed and dried at 105 • C for 24 h. Dried soil cores were sieved at 2 mm to isolate coarse fractions greater than 2 mm, and total bulk density and fine fraction bulk density were determined. The average bulk density across a site was then reported, and the average percent porosity was calculated.
All plots with forest floor samples from SK and NWT were composited by site and air-dried, while samples from the WBEA survey were already air-dried and sieved before being transferred to Trent University. Forest floor samples from SK and NWT were weighed before and after air-drying, and the air-dried samples were then ground. The pH of the forest floor on all ground samples was determined using the same CaCl 2 method used for mineral soil samples. The same method used to determine water-extractable anions in mineral soil was also used for air-dried forest floor samples. Finally, samples were further milled using the SPEX Mixer Mill 6000 for an average of 8 min and analysed for percent CNS.

Environmental Data
Modelled nitrogen and sulphur deposition data (in eq ha −1 yr −1 ) were obtained from Global Environmental Multi-Scale Modelling Air Quality and Chemistry (GEM-MACH [36]). Data were obtained at a resolution of 2.5 km × 2.5 km. Modelled deposition scenarios were based on emissions inventories from 2006 and 2010, with 2010 data focused on the Athabasca Oil Sands. Nitrogen deposition data included 11 different species, while S deposition data included four different species (Supplementary Materials SIII). Total N and total S deposition were also included [36]. In addition to deposition data (n = 17), elevation, longitude, latitude, and 24 bioclimatic variables were also included in the analysis (Supplementary Materials SIII).

Statistical Analyses
Initially, 40 soil chemistry variables (Supplementary Materials SIII) and 44 modelled deposition and bioclimatic variables (a total of 84 variables) were considered. However, many studies have outlined the importance and challenges of variable reduction and variable selection with respect to ecological modelling, noting that effectively reducing the number of predictors can increase the reliability and stability of the results [37]. To assess the effects of relevant explanatory variables, PCA was applied to all 84 environmental predictors (Supplementary Materials SIII) before gradient forest analysis. Described as a linear transformation tool, PCA can provide information on data variation but can also be used for exploratory factor analysis to identify variables that are correlated [38,39]. Environmental variables were removed if they were highly correlated with another variable and shared similar environmental processes, or if the variable accounted for very little to no variation in the dataset.
Species abundance data were analysed to determine whether a linear or unimodal ordination approach was better suited for the data, subsequently identifying whether canonical correspondence analysis (CCA) or redundancy analysis (RDA) was applied. This was performed using detrended correspondence analysis (DCA) in the vegan package [40]. DCA identified the first axis as a mid-length gradient (between 3 and 4 SD units), indicating that either a unimodal (CCA) or a linear ordination (RDA) method could be used. While CCA, a weighted form of RDA and a popular ordination technique used in ecological studies, generally allows for easy interpretation of species assemblages [41,42], several drawbacks exist. Among the more relevant drawbacks are that rare species can have a larger influence on the analysis. This was of particular concern with the dataset, given that a species was only considered rare if it appeared across no more than a single site. In addition, total inertia (variance explained) is represented by R 2 rather than adjusted R 2 [41], meaning that the results obtained would be biassed and unadjusted for the number of predictors. Consequently, RDA, a multivariate method combining both multiple linear regressions and PCA [41], was selected as the preferred ordination method. RDA was performed on each category of predictors (bioclimatic, deposition, and soil chemical variables). Variance inflation factors (VIFs) were then analysed to ensure that multicollinearity was reduced and that the selected variables produced stable and interpretable results. While both climatic and deposition variables returned some high VIFs, only deposition variables were reanalysed due to the potential for overlapping environmental processes between variables. To further select only relevant and statistically important deposition variables, stepwise regression was used [40].
All statistical analysis for variable selection was performed in R Studio Version 1.0.136 using the vegan package [40]. Following variable reduction, 49 predictors were removed using variable selection methods, leaving 35 variables for further analysis (Table 1; see Supplementary Materials SIII). These environmental variables were first analysed using the varpart function in vegan to determine the amount of variance explained by each category of data, and then analysed with species data using gradient forest analysis.   Gradient forest analysis identifies thresholds for multiple species across multiple environmental gradients using non-parametric, non-linear, and highly flexible functions and can be used to identify both the shape and magnitude of compositional change along environmental gradients [43][44][45][46]. Gradient forest analysis was performed in R using two packages, extendedForest and gradientForest [44,47]. The extendedForest R package is an extended version of the randomForest package based on Breiman's random forest models [47,48] that uses the average result of a forest of regressions or classification trees (based on available data) to determine a threshold value [31,43]. In our case, the R package randomForest used regression trees and repeatedly partitioned sites to maximise the homogeneity of the groups with respect to the response variable. Partitions are based on a split value (s) of some predictor (p), in which any sites partitioned to the left are sites with a predictor value smaller or equal to the split value, while to the right are sites with a predictor value greater than the split value [31,32,45]. At each partition, the split value is selected to minimise the sum-of-squares deviation of the species abundance data (reduce impurity), in turn maximising what is called fit improvement [31,32]. The fit improvement or reduction in impurity of the species abundance data determines the importance of a split and is a measurement of how much variation has been explained by that partition [31,32,43]. This partitioning method is repeated until only a minimum number of sites in the partition is reached, eventually forming a terminal node where the predicted value is equal to the mean response of the node [31]. Each split, which can be interpreted as a threshold, results in two separate branches, eventually forming a tree [32,45]. Additionally, the instability of an individual tree is reduced by repeating the construction of a tree a large number of times (creating a forest), a function that can be set in the gradientForest package [44].
Each tree is fitted to a random sample of observations, termed "in-bag" and each split is determined from a random subset of environmental variables, resulting in raw importance values [32,45]. The observations not used in the construction of the tree, termed "out-of-bag" observations, are then used to cross-validate the performance of a single tree [45]. The mean cross-validated performance of all trees in a forest results in the goodness-of-fit measure (predictive performance R 2 S value [45]). The predictive performance R 2 S values determined in extendedForest represents, for each species, the proportion of variance explained by the random forest model and are determined by evaluating the prediction error in the model without a given environmental variable and comparing it to the prediction error of the full model [31,32,49].
In addition to the predictive performance R 2 S value, randomForest also produces accuracy importance values and raw importance values that are used to assess community and species compositional changes along environmental gradients. The extendedForest package also attempts to reduce the effect of correlation between environmental variables by performing conditional permutations rather than marginal permutations [32]. Conditional permutations are used to control for inflated importance measures based on correlated environmental variables and differ from marginal permutations in that environmental variable values are permuted within bins of data defined by tree splits in the forest for predictors correlated above a specific threshold, r [32,43,45]. The results from all random forests are then combined in the gradientForest R package.
In gradientForest, the predictive performance R 2 S values can be partitioned into contributions (R 2 SP ) from each predictor in proportion to the predictor's importance. These contributions (R 2 SP ) can then be averaged across all species to provide the overall conditional importance of a predictor (R 2 weighted importance [49]). The R 2 weighted importance values serve to classify environmental variables in order of importance for changing community composition [49]. Accuracy importance values, defined in extendedForest, are also provided as a measure of the importance of each variable by assessing its contribution to the overall model's goodness-of-fit [49]. Accuracy importance values are determined by the increase in the out-of-bag mean square prediction error or a reduction in performance when a given environmental variable is randomly permuted [45,49]. Large accuracy importance values indicate that an environmental variable has true predictive power, while small or negative values indicate the environmental variable explains very little or nothing, and model accuracy would not be affected if the variable were removed [49]. The gradientForest package then, in proportion to the accuracy importance values defined in extendedForest, distributes the predictive performance R 2 S values for all species attributed to the predictor over the gradient of each environmental variable [31,49], thereby defining compositional turnover functions for each variable and displaying both individual species and community level thresholds using graphical outputs [45,49].

GradientForest Outputs
Gradient forest analysis provides several measures of variation in the model, including R 2 overall importance and R 2 weighted importance. In gradientForest, R 2 performance values (R 2 S ) are given to each species for each environmental variable. The sum of all R 2 S values for any given species provides the overall R 2 performance of that species and is an indication of how well variation within that species is predicted by all environmental variables, with greater values indicating a better model fit. In gradientForest, any species with a zero or negative R 2 S performance value is excluded from the model. In addition, R 2 -weighted importance values are produced for each environmental variable by averaging out all R 2 S values for a given environmental variable across every species in the model. The R 2 weighted importance value provides a measure of how much variation in the model is predicted by each environmental variable. The mean R 2 weighted importance value is an indication of how much variation in the entire model is explained by all environmental variables analysed.
GradientForest produces density plots that highlight regions of importance for changing community composition across an environmental gradient. In these plots, results are smoothed with density curves to assist in interpretation. Raw importance values are first aggregated into narrow bins and displayed as grey histogram bars (binned split importance values). Split locations, interpreted as thresholds or breakpoints in the community, represent both locations and relative importance across the gradient [32,49]. The density of splits (black line) is obtained through kernel density estimation, and the ratio of densities (the blue line, representing an estimate of composition turnover) is obtained by dividing the normalised density of data (red line). The resulting area under the blue curve is equivalent to the R 2 -weighted importance of each predictor. Together, the density of data, the ratio of densities, and the raw importance values are normalised and allow for rates of composition turnover to be compared across all predictors [31]. In the density plot, peaks in the blue curve occur when the ratio of densities is greater than 1 and the density of splits outweighs the density of data [31,32,45], identifying regions where there is a reduction in impurity and the associated threshold is of greater importance. Cumulative importance plots are also provided in gradientForest. The first plot displays compositional turnover functions (species curves) for each predictor and identifies individual species that are strongly associated with specific community thresholds (or breakpoints) across each predictor gradient [31]. Species curves can also be averaged to produce an overall composition turnover function for the community across a given predictor, showing the overall community response across a given gradient [31,32].
Here, we used gradient forest analysis to evaluate plant species abundance data for 46 Jack pine dominant forest plots against 35 predictors (following the removal of 49 of the initial 88 predictors using variable selection methods). Plant species data across all analysed sites included abundances for 206 species (n Vascular = 124, n Bryophyte = 36, n Lichen = 44, n Fungi = 2). Species that occurred at a single site only were considered rare and removed prior to analysis. Gradient forest was performed in R using both "extendedForest" and "gradientForest" R packages [45,47]. The analysis was performed by setting the gradientForest call to generate a forest of 500 regression trees (n tree = 500), setting the correlation threshold to 0.5, and dividing the environmental gradients into 201 bins to define density estimation (n bin = 201, [45]).

Results
Three categories of variables (climate, deposition, and soil chemistry variables) were evaluated for 46 sites across Alberta, Saskatchewan, and the Northwest Territories (but primarily located around the AOSR). Total deposited sulphur (TDS) ranged from 23.2 to 1182.2 eq ha −1 yr −1 (mean = 179.4 eq ha −1 yr −1 ; Table 1), while total deposited nitrogen (TDN) ranged from 37.2 to 597.0 eq ha −1 yr −1 (mean = 176.4 eq ha −1 yr −1 ; Table 1). Total N and S deposition were both positively skewed (TDN = +0.30, TDS = +0.38). Although the two greatest contributors to TDN were dry nitrogen dioxide and dry ammonia, both variables were removed following variable reduction using PCA. Annual mean temperature (sg_12) ranged from −2.8 to 1.6 • C (mean = 0.02 • C), while annual precipitation (bio_12) ranged from 359 to 483 mm (mean = 451.5 mm). The elevation of the sites varied greatly, ranging from 231 m to 622 m, with a mean of 425 m. Sites were classified as mostly acidic, with pH ranging from 3.59 to 7.19 (mean = 4.32) and bulk density ranging from 0.94 to 1.64 g cm −3 (mean = 0.58 g cm −3 ). Percent organic matter ranged between 0.40% and 41.72%, with a mean of 2.78%. Correspondingly, the percent moisture ranged between 0.09% and 8.25% (mean = 0.69%). Some variables experienced high standard deviations, particularly TDN (standard deviation [SD] of 131.6 eq ha −1 yr −1 ) and elevation (SD = 108.3 eq ha −1 yr −1 ). See Supplementary Materials SIII for descriptive statistics for all environmental variables initially considered.
Partitioning the variance into three environmental groups (climate, deposition, and soil chemistry) indicated that all three accounted for 52.9% of the overall variance. Soil chemical variables accounted for the largest percentage of the total explained variation (26.6%), with climate variables at 19.1%. Deposition variables accounted for the least variation, at 5.0% of the total explained variation. Nonetheless, this is important as the deposition is an anthropogenic pressure that can alter the plant species composition dictated by site conditions (soil chemistry and climate). The joint effect of all three partitions together accounted for 43.3% of the total explained variation. While the joint effect of climate and deposition was minimal (0.02%), the joint effect of climate and soil was 13.0% of the total explainable variation. The partition table also returned the joint effect of deposition and soil, which suggested that this combination was less representative of the total explainable variation than random normal variables. Although the total variation was just over half of the full model variation (52.9%), the contribution to the variation in species abundance data was statistically significant (p < 0.001).
Abundance data for the 206 species across the 46 sites were analysed in gradient forest against the 35 environmental variables. Analysis revealed that of the 206 species, 48 had positive R 2 overall performance values, and the mean R 2 weighted importance was 0.00685 ( Table 2). Seven of the top 48 species with positive R 2 values were established as very well predicted by the environmental variables (R 2 > 0.4, Table 3, Figure 2), indicating overall model performance was particularly well explained for these species. Several environmental variables identified with high R 2 S performance recurred across the top seven species, including forest floor pH (pH_LFH), Julian day number at the start of the growing season (sg_01), percent nitrogen (X.N), percent moisture (X._Moisture), latitude, and Nmineralization (N.NH 4 ), demonstrating their importance concerning changes occurring within these specific species groups (Table 3).
Nitrogen 2023, 4, FOR PEER REVIEW 11 Figure 2. Species R 2 overall performance is a measure of the fit of the random forest model for each species, with all 46 species with positive R 2 values labelled on the X-axis. The red outline identifies seven species that were particularly well predicted by the environmental variables (R 2 > 0.4; see Table 3). Species with zero or negative R 2 overall performance values are not included.  Figure 2. Species R 2 overall performance is a measure of the fit of the random forest model for each species, with all 46 species with positive R 2 values labelled on the X-axis. The red outline identifies seven species that were particularly well predicted by the environmental variables (R 2 > 0.4; see Table 3). Species with zero or negative R 2 overall performance values are not included.  The R 2 weighted importance value for percent moisture (X._Moisture) was found to be less than one-third of that of the top R 2 weighted importance value (0.00474 compared to longitude at 0.0147, Table 2), indicating that any variable below this explains substantially less of the overall model variance. All variables with R 2 weighted importance greater than percent moisture had an associated positive mean accuracy importance value, except exchangeable cations-aluminum (Al_EC, Figure 3). Accuracy importance was greatest for Gdd (growing degree day) above base temperature for period 1 (sg_09), N-mineralization (N.NH 4 ), and forest floor percent sulphur (X.S_LFH) (311.4, 294.1, and 241.3, respectively, Figure 3), and of all 35 variables analysed in gradientForest, 29 were identified with positive accuracy importance values (Figure 3). However, any variables after forest floor percent sulphur (X.S_LFH) had relatively low accuracy importance, indicating that while these variables did have some effect on model accuracy, they had considerably less predictive power. Among the top 20 predictors, soil chemistry variables were the dominant category (based on variable classification using R 2 weighted importance values, Table 2), supporting the RDA variation partitioning results (with soil variables alone explaining 26.6% of the variation in the data).
Nitrogen 2023, 4, FOR PEER REVIEW 13 however, in the second split, the density of data was considerably lower, indicating that this split accounted for a greater reduction in impurity and a slightly more prominent effect on overall changes in community composition (Figure 4c). Changes in species composition occurred across both split locations, with the first split being associated with more lichen species. Several lichen species demonstrated considerable changes in this range, including Cladina mitis and Cladonia multiformis, but their contribution to overall community changepoints was more gradual and less prominent than the secondary breakpoint ( Figure 4b). The secondary breakpoint demonstrated a more significant community shift and was associated with more vascular species (Figure 4b). Cladina mitis was also identified with the highest specific R 2 S performance value (0.112, Table 3), indicating that variation within this species was best explained by the N-mineralization predictor.  Table 1).
Other soil chemical variables were returned with high R 2 -weighted importance, with both mineral soil and forest floor (LFH) pH identified as contributing to significant changes in community composition (Figure 5a). The density plot for mineral soil pH revealed two community breakpoints where the ratio of densities was greater than 1 ( Figure  5). The first breakpoint, occurring around a pH of 2, was less pronounced than the second breakpoint, occurring around a pH of 6 ( Figure 5). The density of data and binned split importance values (grey histogram bars) peaked between pH 3 and 5. Although a low pH can indicate soil acidification [54], the associated density of splits in this range was very low. This indicates that any compositional changes occurring in this range did not result in a significant community shift ( Figure 5). -weighted importance shows the overall conditional importance of predictors for community turnover, calculated by weighting the species-level predictor importance by the species R 2 (codes refer to environmental variables in Table 1).
Several bioclimatic variables were also identified as having high R 2 weighted and accuracy importance (similar to [50]), with both longitude and latitude returning as top predictors based on their R 2 weighted importance values (R 2 = 0.0147 and 0.0146, respectively; Table 2). However, the associated accuracy importance value for longitude was relatively low (5.55, Figure 3). In addition to the low predictive power within this model, other studies have highlighted the significant role that climate and geographic variables already play in species composition [45,51]. As a result, several bioclimatic variables were precluded from further review of gradientForest results. The variables selected for further consideration had both high R 2 weighted importance and associated mean accuracy importance and were highly influenced by anthropogenic activity. Gradient forest analysis identified several soil chemical variables as important with respect to changing community composition, including percent N and N-mineralization (Figure 3), which are both indicators of N saturation [52,53]. The density of splits plot for the percent N gradient revealed a single split across the gradient, occurring between 0.06 and 0.08% (Figure 4a). Species primarily associated with this community breakpoint were mostly vascular species, including Aster sibiricus, Carex concinna, and Rubus pubescens (Figure 4b,c), with Aster sibiricus having the greatest specific R 2 S performance value across this gradient (0.144, Table 3). The density plot for N-mineralization revealed two important splits occurring across the gradient, the first between 1-3 mg kg −1 and again between 8-10 mg kg −1 (Figure 4a). Across both breakpoints, the density of splits was fairly high; however, in the second split, the density of data was considerably lower, indicating that this split accounted for a greater reduction in impurity and a slightly more prominent effect on overall changes in community composition (Figure 4c). Changes in species composition occurred across both split locations, with the first split being associated with more lichen species. Several lichen species demonstrated considerable changes in this range, including Cladina mitis and Cladonia multiformis, but their contribution to overall community changepoints was more gradual and less prominent than the secondary breakpoint ( Figure 4b).
The secondary breakpoint demonstrated a more significant community shift and was associated with more vascular species (Figure 4b). Cladina mitis was also identified with the highest specific R 2 S performance value (0.112, Table 3), indicating that variation within this species was best explained by the N-mineralization predictor.
Other soil chemical variables were returned with high R 2 -weighted importance, with both mineral soil and forest floor (LFH) pH identified as contributing to significant changes in community composition (Figure 5a). The density plot for mineral soil pH revealed two community breakpoints where the ratio of densities was greater than 1 ( Figure 5). The first breakpoint, occurring around a pH of 2, was less pronounced than the second breakpoint, occurring around a pH of 6 ( Figure 5). The density of data and binned split importance values (grey histogram bars) peaked between pH 3 and 5. Although a low pH can indicate soil acidification [54], the associated density of splits in this range was very low. This indicates that any compositional changes occurring in this range did not result in a significant community shift ( Figure 5).
The cumulative importance plots for pH in soil confirmed that the secondary peak was more important with respect to shifts in community composition (Figure 5c). Several species experienced substantial changes across this secondary breakpoint (pH of 6), including Epilobium angustifolium, Shepherdia canadensis, Viburnum edule, Pyrola secunda, and Hylocomium splendens (Figure 5b), with the latter having the greatest specific R 2 S performance value across the gradient (0.0967; Table 3). The density plot for forest floor pH (LFH) demonstrated similar trends to pH in mineral soil; however, only one community breakpoint was identified (peaking at pH 6; Figure 5a). Although the density of data and binned split importance were greater between pH 3 and 4, the density of splits was relatively low, indicating that very little occurred here with respect to compositional shifts at the community level (Figure 5a). The cumulative importance plots for pH in the forest floor identified several species, including Carex concinna, Aster sibiricus, Rubus pubescens, Habenaria orbiculata, and Zygadenus elegans, in significant compositional changes that contributed to overall community breakpoints (Figure 5b). The variation in Carex concinna was best explained by forest floor pH (pH_LFH), with a specific R 2 S performance value of 0.164; Table 3).
Two important breakpoints were identified in the density plot for TDN. The first occurred at a low range in the gradient (between 0-100 eq ha −1 yr −1 ), while the second breakpoint, occurring between 300 and 500 eq ha −1 yr −1 (peaking around 400 eq ha −1 yr −1 [5.6 kg N ha −1 yr −1 ]) was more pronounced, indicative of a higher rate of change with respect to community composition in this range (Figure 6a; [32]). Important splits across this gradient appeared to occur between 100 and 200 eq ha −1 yr −1 ; however, the density of data was relatively low compared to the density of splits. This indicates that species experiencing a change in this range did not significantly contribute to any community shift, as seen in both cumulative plots ( Figure 6). Across the 300-500 eq ha −1 yr −1 range, the density of splits outweighed the density of data, indicating that, although fewer splits occurred in this range, they accounted for a greater reduction in impurity, contributing to a more significant community threshold (Figure 6a). The cumulative importance plots for pH in soil confirmed that the secondary peak was more important with respect to shifts in community composition (Figure 5c). Several species experienced substantial changes across this secondary breakpoint (pH of 6), including Epilobium angustifolium, Shepherdia canadensis, Viburnum edule, Pyrola secunda, and Hylocomium splendens (Figure 5b), with the latter having the greatest specific R 2 S relatively low, indicating that very little occurred here with respect to compositional shifts at the community level (Figure 5a). The cumulative importance plots for pH in the forest floor identified several species, including Carex concinna, Aster sibiricus, Rubus pubescens, Habenaria orbiculata, and Zygadenus elegans, in significant compositional changes that contributed to overall community breakpoints (Figure 5b). The variation in Carex concinna was best explained by forest floor pH (pH_LFH), with a specific R 2 S performance value of 0.164; Table 3).

Variance Partitioning and Joint Effects
Variance partitioning revealed that all three categories of variables explained a statistically significant 52.9% of all variation. Soil chemistry variables explained the greatest amount of variation, followed by bioclimatic and then deposition variables (26.6, 19.1, and 5.0%, respectively). This is consistent with previous studies showing that soil chemistry variables (followed by climate and then deposition variables) explained the most variation Community thresholds can also be observed in the cumulative importance plots (Figure 6b,c). Shifts in vascular species were predominantly associated with greater levels of TDN (Figure 6b). At low levels of TDN (less than 100 eq ha −1 yr −1 ), Hylocomium splendens, a bryophyte, experienced the greatest cumulative importance change (Figure 6b), and respective specific R 2 S performance values revealed that TDN best explained the variation within this species (Table 3). A small number of lichen species, including Evernia mesomorpha, experienced a sudden rise in specific cumulative importance between 100-200 eq ha −1 yr −1 , although, given the decline in the density of data and that the ratio of densities remained less than 1 (Figure 6a), there was no effect on the overall community. From both the density plot and the cumulative importance plots, it appeared that the secondary breakpoint, at 400 eq ha −1 yr −1 , was representative of a more clearly defined community-level threshold for nitrogen deposition. The density plot for TDS revealed two distinct breakpoints across the gradient. While the first appeared as a broad breakpoint, the second was more pronounced (Figure 6a). The cumulative plot revealed a much smaller threshold at very low levels of TDS. While many important splits occurred between 0-100 eq ha −1 yr −1 , large changes in species abundance corresponded with relatively low, incremental increases in total sulphur, indicating that the community threshold was minor (Figure 6b,c). Across the first distinct breakpoint from the TDS density plot, some splits occurred between 400-600 eq ha −1 yr −1 ; however, the cumulative importance curve indicated that no large shifts in any single species took place. Instead, this breakpoint was likely associated with the many very minor changes occurring in this range, resulting in the more gradual slope observed for cumulative importance plots (Figure 6b,c). This indicates a much smaller effect on community composition. The integration of the ratio of densities for each species (specific cumulative importance plot) revealed that several species were experiencing relatively small cumulative changes across both thresholds observed in the density plot, including Populus tremuloides, Dicranum polysetum, Melampyrum lineare, and Amelanchier alnifolia (Figure 6b). However, Elymus innovatus had the highest R 2 S performance value (Table 3). While both breakpoints were relatively important with respect to community thresholds, the second breakpoint, peaking around 900 eq ha −1 yr −1 , represented a higher rate of change (Figure 6c). Consistent with TDN, this secondary breakpoint was more pronounced, and from these results, it was more representative of an existing community-level threshold for sulphur deposition.

Variance Partitioning and Joint Effects
Variance partitioning revealed that all three categories of variables explained a statistically significant 52.9% of all variation. Soil chemistry variables explained the greatest amount of variation, followed by bioclimatic and then deposition variables (26.6, 19.1, and 5.0%, respectively). This is consistent with previous studies showing that soil chemistry variables (followed by climate and then deposition variables) explained the most variation in plant species composition across European acid grasslands [55] (Stevens et al., 2011b). Atmospheric deposition variables explained the least amount of total variation in species composition, which is consistent with other studies [9,55,56] and may be attributed to differences in the scale of data. Assessing the effects of N deposition at a regional scale often relies on using monitoring techniques and strategies that provide long-term data and cover a wide spatial range. In this study, we used modelled estimates of deposition at a resolution of 2.5 km × 2.5 km coupled with site-specific observations of plant species abundances and soil chemistry. However, modelled N deposition at a broad spatial scale can vary from actual N deposition at a local scale [9,57,58]. Differences between modelled and actual N deposition can be attributed to several factors, including the type of deposition (e.g., wet or dry deposition, [4,59]), proximity to the emissions source, whether they are point-source emissions or not, and the source itself (whether agricultural or industrial, [60,61]). As a result, determining site-specific impacts using regional estimates of N deposition data can be challenging. Therefore, it is appropriate that soil chemistry variables explain the greatest amount of variation because, unlike bioclimatic or deposition variables, soil-chemistry variables in this study were based on soils sampled at the same locations as the plant surveys.
The partition table also revealed that joint effects for some variables were contributors to the total explainable variation. In the current study, the joint effect of climate and soil was 13.0% of the total explainable variation, and with climate representing one of the major five factors contributing to soil formation, these effects are well documented [62]. Although the joint effect of deposition on soil was negligible in this study, the combination of all three categories acting jointly represented a significant 43.3% of the total explainable variation. While it is possible that the effects of deposition on soil could not be captured by the selected variables in this study, many studies have documented that deposition can heavily influence soil properties [3,24,55,56,63]. Isolating the effects on species composition from either soil chemistry variables or deposition can be challenging due to the long-term changes that can occur as a result of increased deposition. These changes can include soil processes such as acidification or eutrophication, which can eventually result in changes in species composition [56,64]. Given the inherent influence that atmospheric deposition can have on soil, some argue that neither should be considered fully independent of the other [55,65].

Drivers of Community Thresholds for Jack Pine Forests in the AOSR
Gradient forest analysis identified five bioclimatic variables among the most important drivers of community thresholds ( Figure 3). This could be a result of the greater variability across a large spatial scale, resulting in a larger environmental gradient and an increase in background environmental noise [9,56,64,66]. Although this study attempted to reduce background variability by selecting a single forest type across a relatively localised region, our results still identified longitude and latitude as top predictors, consistent with previous studies [45,50].
In addition to the bioclimatic predictors, two deposition variables (TDN and TDS) and several soil variables were also important drivers of community thresholds (Figure 3). In our study, percent N, N-mineralization, and pH (forest floor and mineral) were among the soil variables with high predictor importance. The important relationship between these variables and plant community thresholds has been described in several studies evaluating both the direct and indirect effects of deposition on soil [3,56,63,[67][68][69]. Increased S or N deposition can lead to decreases in soil pH, which can subsequently have a cascading community impact. These impacts can range from soil acidification, eutrophication, base cation depletion [70,71], and mobilisation of heavy metals, particularly aluminium (Al, [72][73][74], to changes in N-mineralization and nitrification rates, both important processes that convert organic N into NH 4 + and NH 4 + into NO 3 − [63,75]. Additional deposition variables that can affect soil pH include dust deposition, which is effectively a surrogate for base cations and can represent an indication of soil alkalization [14]. Although in our study, modelled estimates of particle crustal material, representing coarse dust deposition, were initially included in our analysis, no observed effect on species composition was seen, and the variable was not considered further. Ultimately, our results support the important implications of plant available N (N.NH 4 ), percent N, and pH in driving community thresholds ( Figure 3). While the net result of N additions can be large imbalances in nutrients and less plant-available N, this complex cascade of community impacts can eventually result in a loss in plant species composition [67,71,76].
Total N deposition and TDS were considered separately in this study; however, they were correlated (0.85). Several studies have revealed similar findings, suggesting that existing correlations between the two variables create difficulties in isolating their level of influence on species composition [55,56]. Although there has been a concerted effort over the past few decades to reduce S emissions, evidence of a stronger relationship between pH and total acidic deposition instead of just N deposition alone exists [3,64]. This is indicative that S deposition remains an important driver of species composition.

Biodiversity-Based Empirical Critical Loads
In this study, gradient forest analysis provided both community thresholds and independent species response information. Across the TDN gradient, a single, definitive threshold of 400 eq ha −1 yr −1 (5.6 kg N ha −1 yr −1 ) was identified (Figure 6), which was assumed to represent the biodiversity-based empirical critical load for nutrient N. This threshold was highly associated with changes in multiple vascular species, including Pyrola asarifola, Pyrola chlorantha, Cornus canadensis, and Arctostaphylos uva-ursi. In addition, bryophyte and lichen species were also identified, including Pleurozium schreberi, Vulpicida pinastri, and Dicranum polysetum ( Figure 6).
In boreal forests, the response to increased N deposition is often characterized by a general decrease in species composition and species abundance [19]. However, several studies have identified that these declines are often species-specific for vascular [55,77,78], bryophyte, and lichen species [21,77,[79][80][81]. This response can vary depending on other factors, including different N treatments, landscape management, or differences in stand structure or age. However, similar to our results, other studies have reported a strong response in common bryophytes such as Pleurozium schreberi or Dicranum polysetum across comparable boreal forest habitats [77,79,80,82]. In one study, biomass for D. polysetum and P. schreberi declined considerably (by 60% and 78%, respectively) following a 4-year N addition treatment in Finland [83], while in a separate study, P. schreberi was almost entirely absent in long-term study plots following medium and high N-addition treatments in northern Sweden [84].
In our study, both the density plot and the cumulative importance plots suggested that the TDN threshold of 400 eq ha −1 yr −1 (5.6 kg N ha −1 yr −1 ) was significant for overall community-level change ( Figure 6). Across similar boreal forests in Europe, an empirical critical load for nutrient N ranging between 10-20 kg N ha −1 yr −1 was proposed based on changes in soil processes, in both ground vegetation and mycorrhiza, nutrient imbalances, and increased susceptibility to parasites [85]. However, this range was broadly revised to 3-15 kg N ha −1 yr −1 for all coniferous forests [6], with one study suggesting a critical load of 6 kg N ha −1 yr −1 for boreal forests in Sweden [80], while for the AOSR a range of 5-10 kg N ha −1 yr −1 was recommended [60]. In line with our findings, the critical loads established in Sweden were also based on the decline in abundance of several key Vaccinium species due to deposition exceeding 6 kg N ha −1 yr −1 [80]. Similarly, critical loads in the AOSR were derived from the changes in ground vegetation and reduction in lichen species [60]. These studies provide further support for the biodiversity-based empirical critical load of 5.6 kg N ha −1 yr −1 that we found in our study.
From the cumulative importance plots, species that were associated with important thresholds across the TDN gradients also appeared to be among the best-predicted species for the full model ( Figure 2, Table 3). Gradient forest analysis identified seven species that were particularly well predicted by the full suite of environmental variables, five of which were vascular species, including Elymus innovatus, Carex concinna, Arctostaphylos uva-ursi, Aster sibiricus, and Rubus pubescens (R 2 > 0.4, Figure 2). The two best predicted species in this model were Hylocomium splendens, a bryophyte, and Elymus innovatus, a grass species (Figure 2), with both also identifying TDN as a main driver of change ( Table 3). Several of these species, including Carex species, R. pubescens, H. splendens, and E. innovates, have previously been identified as N indicators in the AOSR [60], and all but H. splendens are known nitrophiles that are capable of effectively utilising higher levels of N [2,[86][87][88]. Hylocomium splendens is also known to decrease in cover following N treatments [80,84], with one study noting declines for this species begin above 10 kg N ha −1 yr −1 [77]. In our study, H. splendens experienced a notable response at relatively lower TDN levels (between 0-100 eq ha −1 yr −1 or 0-1.4 kg N ha −1 yr −1 ; Figure 6). However, thresholds identified at the extremes of a gradient can often represent only a small fraction of data sites, and as such, this threshold was not evaluated further to avoid any potential bias [89].
In addition, Arctostaphylos uva-ursi (Bearberry), a vascular shrub found in abundance across the boreal forest, was among the top-predicted vascular species in the full model ( Figure 2) and was also highly associated with the TDN gradient ( Figure 6). Several studies have observed a decline in abundance for A. uva-ursi in response to N additions [60,90]. However, in a more recent study, greater productivity was identified for A. uva-ursi in response to increasing N addition across fertiliser treatments [91], indicating that N was not directly causing a loss of this dwarf shrub. Rather, as cited in other studies, N deposition, even at low levels, favours some fast-growing, nitrophilic, vascular species that often include many grass or sedge species, such as E. innovatus [79,90,92]. As these grass species experience an increase in cover, they begin to outcompete slower-growing, low-lying, prostrate shrubs such as A. uva-ursi for resources, including light [3,21,87,91,93]. Apart from the outcompete model between fast-growing grass species and slower-growing shrubs, there may be other mechanisms that could explain the different responses observed in different species along the gradient [9,23,77,78,94]. Additional studies on these driving mechanisms and the long-term response for specific species identified in gradient forest analysis may be beneficial in characterizing this relationship.
While the contribution of S deposition to increased soil acidity and its resulting effects on an ecosystem have demonstrated its importance in community composition, S was not the primary focus of this research. However, it should be noted that across the TDS gradient, one significant community threshold appeared at approximately 900 eq ha −1 yr −1 (14.4 kg S ha −1 yr −1 ; Figure 6), which is relatively consistent with results citing declines in lichen occurring at similar S deposition levels [95].

Limitations and Uncertainties
One of the primary limitations was the availability of data. Although there were many environmental predictors available for this analysis, these are not the only variables responsible for changes in plant species composition. Other factors, including climate change, land-use intensification, historical events, species interactions, temporal variability, or even fire history [43,96], may have ecologically significant roles in species composition. In addition, for this analysis, variable reduction methods were followed to reduce predictors if they were highly correlated and shared similar environmental processes. However, in a separate gradient forest study, the selection of predictor variables was responsible for a low overall explained variation [43], also observed here. Furthermore, the resolution across the data in this study changed considerably. While both species abundance data and soil-chemical variables represented site-specific measurements, modelled deposition estimates had a broader resolution (2.5 km × 2.5 km), which may not have been fully representative of true site-specific deposition. Given these differences, it is possible that the gradient forest analysis may not have been able to accurately capture the full model variability.
Moreover, in this study, we were faced with a common situation in ecological modelling where the number of sites or observations was only marginally higher than the number of predictors considered. Generally, this can result in poor model predictions. However, random forests and subsequently gradient forests are known to be able to handle model overfitting or underfitting reasonably well [31,32,45,97]. In a gradient forest, both the built-in randomization process referred to as bagging, and the repetition of a single tree to create a forest are meant to improve the model's generality, avoid over-fitting, and create a model with high performance. However, additional studies could take a different approach following variation partitioning, as the use of categorical layers of predictors in a gradient forest might improve model performance.
Finally, as mentioned with respect to the TDN gradient, any community thresholds identified at the extremes of a gradient, representing few data points, can potentially be biassed in gradient forest analysis [89]. While our results found a community threshold for S that was consistent with other studies [95], there is a possibility that this represents one such example of bias as the community thresholds were located close to the extremity of the gradient (S range: 23.2-1182.1 eq S ha −1 yr −1 ).

Conclusions
In this study, partition variation provided a wide-angle view of the relationship between soil chemistry, bioclimatic, and deposition variables and the variation in plant species composition. This was followed by gradient forest analysis to further define this re-lationship through species-level and community-level responses across multiple gradients. Gradient forest provided a multivariate analysis across multiple environmental variables, which is an advantage compared to other gradient methods, including TITAN or regression analysis, that only consider a single predictor. Consistent with other studies, soil chemistry variables explained the largest amount of species variation among predictor categories; however, joint effects between variables were also relatively important. Plant species responses across the nitrogen deposition gradient highlighted that different species are susceptible to different thresholds across a single gradient and may be driven by different mechanisms. Across the nitrogen deposition threshold, there was a mix of vascular and bryophyte species that experienced important community-level changes, including A. uva-ursi, several Pyrola species, C. canadensis, P. schreberi, D. polysetum, and V. pinastri. Similarities between our results and other studies suggested that community thresholds identified under gradient forest analysis were ecologically relevant and supported a biodiversity-based empirical critical load for nutrient nitrogen of 5.6 kg N ha −1 yr −1 for Jack pine forests in the AOSR. While the 14.4 kg S ha −1 yr −1 threshold was consistent with previous work, there was uncertainty in this threshold given the potential for bias at the extremities of the gradient. Further studies could benefit from performing gradient forest analysis on individual categories of predictors to better define the variability seen in species and include a wider breadth of data, such as fire history or frequency across the landscape.