Site productivity indices for native forests in southeast Queensland

Site productivity, or site quality, describes the potential biomass growth and yield of vegetation at a given location. Land managers have devised indices for site productivity using attributes related to plant yields or growth rates, and these have great utility when available spatially in maps. The main factors determining site productivity include climate, soil and terrain characteristics. Here we analysed four productivity indices (two based on remote sensing only, two based on modelling and algorithms using spatial datasets). The tested indices were available over a 150,000 km2 area of southeast Queensland Australia, a region dominated by Eucalyptus and Acacia species. We were interested in comparing the indices regarding underlying drivers, effects on vegetation types and the general distribution of site productivity across our study region. Our methods included histograms of spatial attribute intersection, and multivariate linear regression. Remote sensing has clear advantages in capturing current conditions, which potential productivity algorithms cannot depict. On the other hand, maps with productivity algorithms provide large-scale robust information on biomass growth/yield that is sensitive to the main drivers of plant growth (e.g. climate, soil).


Introduction
Site productivity (also called site quality) describes the biomass production potential of vegetation limited by local site conditions, including climate, water availability, solar radiation, physical soil characteristics and nutrients supply. Robust information on site productivity is critical for land management, for grazing, agricultural and forested land [1][2][3]. Historically, one of the first sources for site productivity information were forest yield tables that are still used worldwide [4][5][6][7]. Yield tables often provided information on site class or bonity, that is, the wood volume production per unit area per year for a particular timeframe (e.g. m 3 ha -1 year -1 at age 100). Other sources use maximum stand basal area (m 2 ha -1 ), as a more easily measured metric [8] compared to increment or production, that require repeated measurements, increment cores or growth models [9,10]. Basal area can be quickly measured with basal area sweeps, so called Bitterlich samples [11], and used as an allometric estimator for above ground biomass [12], as an input for forest growth models [13] and as guidance for thinning or selective harvesting [14].
Other site productivity information can be calculated from field measurements [15,16] and estimated from remote sensing [17,18]. Models are now increasingly used to obtain wall-to-wall maps of  [19], supported by the advent of consistent remote sensing data with high spatiotemporal resolution, including NASA's Landsat and Terra and Aqua satellites and Sentinel missions by ESA [19][20][21]. The benefits of spatial site productivity data are the efficiency of data collection and consistent data quality. For sparsely populated areas with low site productivity, combined models with remote sensing is often the only viable source for productivity information. The Australian forest inventory is underpinned by a productivity map, obtained from climate data, remotely sensed vegetation cover and a biophysical model [23,24]. The primary metric of this Australian productivity map is net primary production (NPP gC m -2 year -1 ), similar to the current global productivity metric based on temporal MODIS satellite data [25,26].
We used here available datasets for site productivity for a study region in southeastern Queensland in Australia, including the Australian net primary productivity map, NPP [24], remotely sensed Foliage projected cover, FPC (i.e., percentage of ground area occupied by the vertical projection of foliage) using multispectral data [27,28], remotely sensed 95 th percentile of canopy height using LiDAR data, HT95 [29] and a new site quality index SQ [30], which follows concepts similar to those formulated by [24,31,32]. Despite the differences in origin and conceptual approaches, all these datasets (NPP, FPC, HT95 and SQ) should describe site productivity. To date, no comparison of these datasets have been made and it is unclear which dataset is more suited to evaluate plant production. The objectives of this study are, (1) to show the distribution of productivity index values within native forest vegetation types, (2) to analyse relationships between site quality indices and (3) to explore drivers and variable importance of environmental variables on the tested site quality indices.

Study Area
The study area is a 15 million ha region of southeast Queensland, Australia, situated south and east of latitude/longitude (-24.5°, 149.4°). Figure 1 shows the study area and main vegetation types. Appendix A contains maps depicting the variables used in the study. The climate is subtropical with summer rainfall, highest near the coast. The terrain is mostly low elevation <300 m above sea level except for a central range of hills at 400-800 m running N-S, highest in the south at 800-1000 m.

Materials and Methods
All data was analysed as raster values. Forest vegetation types (VEG) were converted from the polygon to the raster format on the equivalent scale. The spatial resolution for all rasters was 3 arc seconds latitude and longitude, or about 85 m between pixel centres in the central part of the study area. Some component layers and the index NPP were originally created at lower spatial resolutions, e.g. rainfall at 30 arc seconds, NPP at 9 arc seconds, and these were converted as required to match the standard 3 arc seconds. The HT95 map layer was clipped to match FPC, so these two layers contain year 2014 existing vegetation only. All other data in the study area includes both cleared land and vegetated land. We summarized the used data in Table 1. All indices and variables were modelled in some way, e.g. spatial interpolation from samples, rather than obtained by direct measure. FPC and HT95 were obtained by remote sensing, and adjusted for historical FPC series and some topographic anomalies (cliffs, landform shading, etc.). Appendix C contains details of the remote sensing measures HT95 and FPC.

Vegetation types
We used a GIS map layer of pre-clearing native forest vegetation types 'VEG' [33] to assess relationships between vegetation type and site productivity. We focus on the major forest types of Rainforest, Eucalypt and Acacia and use here eleven VEG type categories (table 2). Minor forest types in the study area, including Melaleuca spp. and Callitris spp. woodlands, heath, and native grasslands dominated by Astrebla spp. and other species (altogether 7.6% of the total land area) are not included in the analyses. Much of the forest has been altered or cleared over the last century or more. We have not filtered the VEG mapping to remove cleared land. The reader may compare VEG with FPC or HT95 maps to see extent and density of year 2014 forest vegetation (see appendix A).
We describe and compare the distribution of index and variable values for each of the 11 VEG types in table 2 and figure 1. From the 11 considered VEG types, types 2, 209, and 210 have the highest yields and timber productivity (table 2).

Site productivity indices and environmental variables.
The GIS rasters encompassed a gross area of 20.9 million pixels. We converted all raster layers (apart from VEG) to 7 classes in order to facilitate comparisons and analyses. Classes are based on the percentiles of the range of pixel values found in the original layer, and are numbered 1-7 in presumed ascending order of their effect on productivity. Note the class size intervals are not equal. These classes of variable are defined in table 3. The gross study area of 20.9 million pixels is defined by the SQ map (appendix A). Eleven mapped VEG types cover 19.4 million pixels, including both cleared and currently forested land. The land with recent vegetation measures (year 2014 FPC and 2009 HT95 layers intersection) covers 11.8 million pixels. The FPC layer does not include cropland and water bodies, and 99% of FPC pixels have greater than 9% cover. We removed pixels in the HT95 layer with canopy heights smaller than 5 m, and then filtered the result to further remove any remaining pixels with zero FPC. Some minor areas (about 0.2% of pixel count) of low height and low FPC, eg regrowth clumps, are still included in the results.
We then intersected the four site productivity index layers (SQ, NPP, FPC, HT95) with a CASE layer of combined environmental variables (Climate, PSMI, Sfert5, DES and TWI) using the nominal classes 1-7. "CASE" means a 5 digit code from 11111 to 77777, where each digit represents the nominal classes for the five environmental variables (Climate_PSMI_Sfert_DES_TWI) respectively. This analysis was done for every pixel in the study area. Pixels with a zero or missing value for any variable or index were excluded.
We then counted the instances of each CASE, which intersected with a pixel in each site productivity index class 1-7. We did this for all land combined, and then also for each VEG type separately. The sum of pixel counts for each productivity index class 1-7 were then used as weights to account for the number of samples per CASE in the regression model.

Analysis of drivers using linear multivariate regression
NPP [24] and SQ (appendix B) were derived using environmental variables, while FPC and HT95 are based solely on remote sensing data. We were interested in understanding the drivers of the analysed site productivity indices and used multivariate regression models to test whether environmental variables have linear relationships with site productivity. To this end, we used two model types, one with untransformed covariates (equation 1) and one with log-transformed covariates (equation 2). Y = k + a * Climate + b * PSMI+ c * SFERT +d * DES + e * TWI (1) Y = k + a * log(Climate) + b * log(PSMI)+ c * log(SFERT) +d * log(DES) + e * log(TWI) (2) where Y is a class value 1-7 for SQ, NPP HT95 or FPC, and k, a, b, c, d, and e are fitted coefficients using weighted regression and R statistical software. The weighting factor was the number of pixels for each CASE within a given productivity class. For instance, for CASE 11111 and SQ = 1, the number of pixels for all land is n=1703. Similarly counts within VEG type were used as weights when analysing the productivity index within VEG. For example, CASE 11111 and SQ = 1 occurred in five VEG types, 210, 212, 213, 5 and 10, with pixel counts of 19, 334, 2, 498 and 452 respectively (sum= 1305). The difference 1703 -1305 is land which is not one of the 11 mapped VEG types. We included HT95 and FPC for regression analysis, even though these metrics are affected by past history not just environmental factors; for example, forest regrowth may have considerably lower height and vegetation cover than the potential state for the site (as captured by SQ and NPP).
We also fitted linear models (equation 1) by vegetation type. With this analysis we can determine which variables are most important determinants for the tested index, and whether there are discernible and significant differences in site productivity drivers for the various VEG types.
We investigated the potential for the use of principal components analysis (PCA) for illustrating relationships between CASES and productivity indices for the VEG types. However, the matrix sizes proved too large for calculation with 10-20 million samples (pixels) with ~ 10,600 non-zero CASES for each VEG type and index combination. The histogram layout in the VEG results follows a manually sorted order of VEG groups, L->R with descending proportion of high site quality SQ. Since Climate and PSMI are covariates of SQ, the VEG group display order also follows an approximate descending order of pixel counts for these variables.

Vegetation types and site productivity
First we illustrate the distribution of the four tested productivity indices for each vegetation type (figure 2). The VEG type with highest SQ is the (current or former) Wet Eucalypt forest (VEG type 2), with the median SQ class 6 (about 46.9 m 2 /ha). Somewhat surprisingly, much of the Rainforest type (VEG 1) falls into SQ class 4 (SQ about 36.6 m 2 /ha), which is similar to the drier Eucalypt types (209, 4, 211, 210 and 213). This probably reflects the inclusion of the wide-spread inland "dry scrubs" in the rainforest category. The two productivity indices SQ and NPP had a rather similar distribution within VEG types, in particular for the more productive vegetation types.
The two remote sensing measures, FPC and HT95, which consider only current vegetation, have a similar pattern left to right with decreasing SQ. High SQ and high NPP is in general associated with high FPC and high HT95. The distributions of FPC and HT95 tend to match within VEG types, except for Plains type 5, which has some higher cover and low heights.
We noted that these maps are created from different data sources. Most vegetation types are found with a large range of FPC and HT95. Only the less productive vegetation types (Mixed, Temperate, Plains and Brigalow) are limited to classes 1-5. The two vegetation types with highest average SQ distributions within VEG types, for example types 1 and 2, have both high FPC and HT95, and the low SQ type 10 has both low FPC and HT95. VEG type 1 (Rainforests and scrubs), exists with a broad range of FPC and HT95, due to the variable natural conditions in this group and the historic significant clearing for agriculture and current regrowth.  (for details  see table 3). Figure 3 shows the distribution of environmental variables within the 11 VEG types. The dendrogram at the base of figure 3 shows groups of VEG types with the most similar distribution patterns of pixel counts. Similarity was determined using hierarchical classification analysis (HCA) of the data in figure  3, after transposing the rows and columns (11 rows x 35 columns).

Vegetation types and environmental variables
Climate and PSMI follow the same pattern as the site productivity indices shown in figure 2. Soil depth (DES), terrain wetness index (TWI) and soil fertility index (Sfert) do not fit this mould. For instance, the high SQ, high productivity wet eucalypt forests (type 2) are predominantly found on sites with low soil depth (median or below), on drier parts of the topography (ridges and slopes), and on less  Only low productivity Plains (type 5) and Brigalow (type 10) vegetation have considerable number of pixels with DES greater than 60 th percentile. These western areas have low rainfall, and thus a low Climate class, which means low productivity despite having deeper soils. The soil depth in these cases is a result of sandy depositions (Plains) or in situ formation (Brigalow) on clay soils which swell or shrink with the seasonal wet/dry cycles (Vertosols), and result in the incorporation of organic matter to significant depth in deep soil cracks.
A similar pattern can be observed for TWI. The more productive forest types in the near coastal regions occur on hilly land with slopes and ridges, whereas the drier inland and relatively flat areas receiving run-on, do not support productive forest because rainfall is low and evaporation is high.
The highest site fertility values are found for vegetation types on Basalt, since the Sfert index reflects the importance of P as a scarce nutrient in most Australian soils. However due to the location away from the higher coastal rainfall zones ( figure 1, appendix A), the Basalt vegetation types (type 211) have only average SQ compared to other vegetation in the study region. Some Wet Eucalypt and Rainforest types also occur on basaltic parent material, and will have some of the highest SQ in the study area.
Next we checked using equation 1 whether environmental variables had different importance in explaining different SQ within each of the eleven vegetation types (table 4). We found there were no substantial differences in the relative importance of the variables by vegetation type (table 4). Only the Temperate vegetation type 215 deviated, with more important TWI, while for all other vegetation Climate was most important. Figure 3 suggest that TWI has a wider variance and range than the other variables within Type 215, and we note above that Type 215 also has the lowest R 2 and a large intercept value. This suggests that a linear model is not a good representation of the SQ algorithm for the temperate vegetation type, and/or that the vegetation type itself has high internal variability, which is not well described by the mapping. Figure 2 showed that wet eucalypt and coastal eucalypt vegetation types (2 and 209 respectively) have a high proportion of high SQ and NPP classes. These vegetation types are also most strongly associated with high climate and high PSMI classes ( figure 3). Table 4 however shows lower coefficient values for climate as a driver of SQ within vegetation types 2 and 209, and this arises because there is a relatively narrow range of climate and SQ in those types. Conversely, where climate variation is high within the vegetation type, for example for rainforest and Corymbia (types 1 and 210; see figure 3), then climate has the highest regression coefficient (table 4).

Site productivity and environmental variables
Now we illustrate the relationships between the indices and variables, for all land in the study area. Figure 4a shows that NPP and SQ are positively related, but not strongly. SQ and NPP both have weak positive relationship with HT95 and FPC. The strongest relationship in the matrix appears to be that sites with high cover tend to have tall trees, and vice versa. Figure 4b shows the strongly linear association between climate and SQ, and to a lesser extent between SQ and PSMI. The associations with NPP are also linear, but not as strong as for SQ. Distribution of height and vegetation cover are also associated with Climate and PSMI but more weakly because of the inclusion of non-mature vegetation. The three landscape variables (DES, TWI and Sfert) do not appear to be strongly positively associated with any of the productivity indices. Figures 2 and 3 indicated pronounced differences between the vegetation types in the study region regarding site productivity and the environmental variables. We were next interested in the underlying drivers of varying site productivity. We use equations 1 and 2 to compare the relative importance of environmental variables in the four tested site productivity indices, assuming linear (equation 1) and non-linear relationships (log-transformed, equation 2).

Drivers for site productivity
We have not shown standard error of the residuals (Residuals SE), because the software output using weighted samples does not produce meaningful results in this case. Instead we checked the residuals by applying one of SQ equations (linear model from table 5, equation 3) and found that 95% of residuals for SQ class were between -1 and +1, meaning that SE is <<1. SQ = -2.046 + 0.7164 * Climate + 0.3053 * PSMI + 0.2815 * DES + 0.2276 * TWI + 0.1426 * Sfert (3) All models had low p-values (table 5). Our results further indicated, that log-transformed models (equation 2) were not more correlated with site productivity than the linear models (equation 1), as evident from general higher coefficients of determination (R 2 ) for linear models. Highest R 2 was observed for SQ and NPP models. NPP and SQ are both expressed as single numerical index, and are the product of multiplicative algorithms and variables which follow that of [28] in calculating Net Photosynthetic Index, viz. NPI = FPC * Moisture * Fertility * Light * Temperature. It is then somewhat surprising, that equation 2 did not provide a better fit than linear formulation (equation 1), since the multiplicative (log) form more directly reflects the source algorithms (appendix B) [24]). We noted however in table 8, that the differences in R 2 for SQ using equation 1 and equation 2 are relatively minor, and that the order of importance of variables is largely unchanged.
Remotely sensed canopy height (HT95) was more correlated with the used environmental variables than was vegetation cover (FPC), although R 2 for both are quite low, (R 2 <0.33 for HT95 and <0.15 for FPC), because height and cover are an outcome of history not just environment.
Climate was by far the most important environmental variable for all of the used site productivity indices. NPP is less strongly related to Climate than is SQ ( figure 4b, table 5). This was expected since Climate per se was a variable used directly in the SQ algorithm (see appendix B). NPP uses similar components to those in Climate, but with a different algorithm. Second most important driver of SQ and NPP was PSMI, the ratio of evaporation and precipitation. We expected that Climate and PSMI have some quite high autocorrelation since both include rainfall, evaporation, solar radiation and temperature as input variables, these being calculated monthly and summed to an annual value. When separated to 11 VEG types (table 4), the relative importance of these variables for productivity class remains similar to that for the study area taken as a whole (table 5).
TWI has almost no relationship with NPP (table 5), probably because the nearest analogous variable in NPP was soil water holding capacity value based on modelled soil depth and estimated plant water availability on a 250 m grid scale. At this scale and using very broad scale soil type mapping, the topographic influence on forest productivity would not be visible.   Table 5. Multiple linear regression results for four site productivity indices and using two model types linear models (left columns) and log-transformed models (right columns). We show intercepts and coefficients for the five environmental variables, separately for each of the four tested indices. At the bottom of the table we show model statistics, p value and coefficient of determination (R 2 ). The length of the horizontal bars highlight differences in the relative importance of the covariates. Green bars indicate positive correlations (e.g. higher climate, higher SQ) and red bars indicate negative correlations (e.g. lower site fertility, higher HT95).
Sfert has lowest coefficient values for SQ across the board, meaning that variation in Sfert is not strongly correlated with SQ at the landscape scale for any of the VEG types. Nevertheless, differences in soil fertility index will be associated with SQ changes at the localised or stand scale. The secondary variables are also important when Climate is not highly favourable; for example, it can be seen in Figure  4a that the medium SQ class 3, with mostly low Climate and PSMI classes, has a higher than average proportion of sites with class 4 and over for DES, TWI and Sfert.
While soil fertility index Sfert was weakly but positively correlated with SQ and NPP, we observed a negative correlation between Sfert and both HT95 and FPC. This suggests that fertile pixels have a higher SQ and NPP, but lower canopy height and vegetation cover (keeping all other variables constant). Negative or neutral coefficients also occurred for DES and TWI with height and cover. It appears sites with high soil fertility index, deep soils, and to a lesser extent the lower (wetter) points in local topography, are now associated with low height and cover, because they were preferentially cleared of mature native vegetation for agricultural use.

Conclusions
This study demonstrated the relative importance of environmental variables in site productivity indices, over a large region (150,000 km 2 ) in southeast Queensland containing diverse subtropical climates and vegetation types. We found that some forest vegetation types of the study area occur in distinct parts of the site productivity range. For forests, robust information on potential basal area as m 2 ha -1 , or carbon accumulation rate (growth) as tC ha -1 year -1 is especially important for assessing many of their diverse functions. Remote sensing can be a powerful tool to complement site productivity information; for example, we found that lower heights and lower vegetation cover based on remote sensing data were associated with deeper fertile soils, indicating that these land types have been relatively more cleared.
There are now several site productivity indices available, each with specific advantages. We found that the environmental variables which most strongly determined the productivity index values across all land, are ranked as follows: Climate > PSMI > DES~ >TWI >Sfert. Our observations illustrate that Climate and soil moisture are the predominant associates of forest productivity indices at a regional scale. Soil fertility and landscape factors do not have much effect where climate and water availability are favourable, but will nevertheless be important determinants of productivity at the localised scale of one hectare of less. monthly data in this collection are available at 3 arcsecond resolution as single (mosaicked) grids for Australia in TIFF format. The 3 arc-second resolution versions of these radiation surfaces have been produced from the 1 arc-second resolution surfaces, by aggregating the cells in a 3x3 window and taking the mean value. The 1 arcsecond tiled data can be found here: https://data.csiro.au/dap/landingpage?pid=csiro:9530 . The 1 arcsecond mosaic data can be found here: https://data.csiro.au/dap/landingpage?pid=csiro:18851 c. WCLIM Temperature and rainfall http://www.worldclim.org/current SQ data input included mean monthly precipitation evaporation, mean monthly temperature minima and maxima all at 30 arc seconds (30") gridded resolution (approx. 1 km grid). The GIS grids contained data with means 1950-2000, which comprised the antecedent version of this. We created a new global dataset of spatially interpolated monthly climate data at a 1 km 2 resolution, including monthly temperature, precipitation, solar radiation, vapor pressure and wind speed. Interpolations utilized MODIS satellite imagery, and instead of using a single model formulation for the entire world, we constructed the final product by selecting the best performing model for each region and variable. Satellite data improved prediction accuracy for temperature variables, particularly in areas with a low station density, but improvements were marginal for other variables, highlighting the importance of dense, high-quality climate station data.
Source data is available here https://worldclim.org/data/worldclim21.html Worldclim.org is a site which compiles and distributes data from various sources.
Historical climate data. This is WorldClim version 2.1 climate data for 1970-2000. This version was released in January 2020. There are monthly climate data for minimum, mean, and maximum temperature, precipitation, solar radiation, wind speed, water vapor pressure, and for total precipitation. There are also 19 "bioclimatic" variables. The data is available at the four spatial resolutions, between 30 seconds (~1 km 2 ) to 10 minutes (~340 km 2 ). Each download is a "zip" file containing 12 GeoTiff (.tif) files, one for each month of the year (January is 1; December is 12). Variables : minimum temperature (°C), maximum temperature (°C), average temperature (°C), precipitation (mm), solar radiation (kJ m -2 day -1 ), wind speed (m s -1 ), wind 10m, water vapor pressure (kPa)