The relative importance of climate and vegetation properties on patterns of North American breeding bird species richness

Recent advances in remote sensing and ecological modeling warrant a timely and robust investigation of the ecological variables that underlie large-scale patterns of breeding bird species richness, particularly in the context of intensifying land use and climate change. Our objective was to address this need using an array of bioclimatic and remotely sensed data sets representing vegetation properties and structure, and other aspects of the physical environment. We first build models of bird species richness across breeding bird survey (BBS) routes, and then spatially predict richness across the coterminous US at moderately high spatial resolution (1 km). Predictor variables were derived from various sources and maps of species richness were generated for four groups (guilds) of birds with different breeding habitat affiliation (forest, grassland, open woodland, scrub/shrub), as well as all guilds combined. Predictions of forest bird distributions were strong (R2 = 0.85), followed by grassland (0.76), scrub/shrub (0.63) and open woodland (0.60) species. Vegetation properties were generally the strongest determinants of species richness, whereas bioclimatic and lidar-derived vertical structure metrics were of variable importance and dependent upon the guild type. Environmental variables (climate and the physical environment) were also frequently selected predictors, but canopy structure variables were not as important as expected based on more local to regional scale studies. Relatively sparse sampling of canopy structure metrics from the satellite lidar sensor may have reduced their importance relative to other predictor variables across the study domain. We discuss these results in the context of the ecological drivers of species richness patterns, the spatial scale of bird diversity analyses, and the potential of next generation space-borne lidar systems relevant to vegetation and ecosystem studies. This study strengthens current understanding of bird species–climate–vegetation relationships, which could be further advanced with improved canopy structure information across spatial scales.


Introduction
Ecologists have long sought to explain the distribution patterns of species diversity, which is an increasingly important research goal as climate change is expected to alter these patterns considerably (e.g. Parmesan 2006, Bellard et al 2012. Animal species richness and abundance distribution patterns have often been explained in terms of environmental variables, including climate and vegetation properties (e.g. Kerr 2001, Ricklefs 2004, Ceballos and Ehrlich 2006, Mittelbach 2010. Past work hypothesized animal species richness is more strongly related to climate and energy (e.g. productivity) at large spatial scales but more closely related to habitat diversity at regional to local scales (Blackburn and Gaston 1996, Currie et al 1999, Deppe and Rotenberry 2008, Kerr 2001, Kerr and Packer 1997, Mittelbach et al 2001. The relative importance of these environmental variables for predicting bird species richness has been explored at various scales (e.g. Cueto and Casenave 1999, Seoane et al 2004, Coops et al 2009, Hinsley et al 2009, Hansen et al 2011, Bar-Massada et al 2012, Fitterer et al 2012, but is still not well defined across scales. This is partly because measurements of vertical habitat heterogeneity and canopy structure (i.e. the horizontal and vertical distribution of canopy elements) have not been widely or consistently available across a range of locations, despite having long been recognized as important determinants of microclimates and microhabitats, and consequently the abundance and distribution of species (MacArthur andMacArthur 1961, Whittaker et al 2001).
Light detection and ranging (lidar) is advancing fieldbased understanding of these ecological interactions by providing novel information on spatial and vertical canopy structure and the impact of this structure on habitat diversity and quality (Bergen et al 2009, Lefsky et al 2002, Turner et al 2003, Vierling et al 2008. There is now a rapidly growing number of studies utilizing airborne lidar for biodiversity research, and many studies have found moderate to strong relationships between remotely sensed vegetation canopy structure, habitat quality, species richness and abundance, and habitat use (Bradbury et al 2005, Clawges et al 2008, Goetz et al 2010, Hill et al 2004, Lesak et al 2011, Seavy et al 2009, Swatantran et al 2012. However, these studies have been limited because airborne lidar data is generally not available over large areas. Consequently it has been difficult to test hypotheses relating the impact of variations in canopy structure to bird species richness patterns over large scales in a systematic fashion. A unique global vegetation canopy structure data set was acquired from space by GLAS (GeoScience Laser Altimetry System) onboard ICESat (the Ice, Cloud, and Land Elevation Satellite) from 2003 to 2009 (Schutz et al 2005). The GLAS data set provides the first opportunity to explore the importance of structure relative to other vegetation properties over large spatial extents and across broad gradients in climate and other environmental factors.
Here we investigate the patterns of breeding bird species richness across the coterminous United States and southern Canada, a domain established by the existence of welldocumented bird observation data sets provided by the North American Breeding Bird Survey (BBS) (Pardieck andSauer 2007, Ralph et al 1995) (figure 1). We predict species richness using models that incorporate a range of environmental and vegetation variables, and assess the relative importance of different influences on bird richness patterns across the study domain. Important aspects of this work include; (i) derivation and use of a range of satellite remote sensing variables, including vegetation canopy structure metrics, (ii) application of climate data that capture extremes and variability as well as mean conditions, (iii) application of advanced machine learning methods (ensembles of regression trees) for data mining and predictive model development within buffer zones surrounding BBS routes, (iv) robust model results and cross validation, (v) production of continuous, relatively high resolution (1 km) maps of species richness for various bird habitat guilds. Whereas some of these topics have been explored in past biodiversity research (see above), few have incorporated all of these elements or considered the relative importance of factors driving species richness patterns consistently across large spatial extents and a diversity of habitat types.

Methods
We assessed 47 bioclimatic and vegetation predictor variables of bird species richness (our response variable) for different guild types and in aggregate. We explored statistical associations and the strength of Pearson correlation coefficients for each predictor variable with species richness. We also used Random Forests, a regression tree method that calculates the average of an ensemble of individual trees using bootstrapped sampling (Cutler et al 2007), to predict species richness. When compared against other species distribution models, Random Forests has consistently performed well (e.g. Lawler et al 2006, Cutler et al 2007. One advantage of the random forest method is that provides us a mechanism for interpreting how important each of the predictor variables are (i.e. their explanatory power) and how results change as predictor variables are changed (so-called partial dependence plots, described below).
2.1. Bird species richness variables 2.1.1. Breeding bird survey. We used data acquired, compiled and distributed by the BBS, which was launched in 1966 to characterize avian population trends along survey routes through time (Pardieck andSauer 2007, Ralph et al 1995). Bird counts are conducted annually at the peak of the nesting season along ∼4100 randomly selected routes throughout the United States and Canada (www.pwrc.usgs.gov/bbs). Along each 40 km route, a trained observer conducts stops about every kilometer and records all birds heard or seen within a 3-min period. The BBS data are publically available (ftp://ftp ext.usgs.gov/pub/er/md/laurel/BBS/DataFiles).
We note that a potential source of error in using BBS data to derive species richness is the lack of complete detectability of all species that occur along a route (Boulinier et al 1998). Nichols et al (1998) developed a series of estimators based on capture-recapture theory to account for incomplete detection among species. The approach, however, requires assumptions that may not be met on BBS routes (e.g. closed populations, equal detection probabilities along routes) and other studies We used the yearly summary data for each route for the years 2004-2006 (inclusive), which most closely matched the timeframe of the other data sets (predictor variables) with which we were working. Only routes and species with all three years of data were used. Bird species richness was thus derived for each of more than 3000 BBS routes in the coterminous US with a full suite of predictor variable data. We summed bird observations across all stops along each route for all three years. The number of unique species recorded totaled 668. We then used these species richness estimates as the response variable in the model development and analyses that follow.
2.1.2. Bird habitat guilds. We categorized and independently analyzed richness of bird species within breeding habitat groups (guilds) using the system developed by Peterjohn and Sauer (1993). We focus on four habitat guilds (forest, open woodland, grassland and scrub) comprising 69% of the total number of observed individuals and 61% of the total number of species (table 1). These guilds were also most broadly geographically distributed and most relevant in terms of terrestrial vegetation habitat, allowing analyses of the greatest number of bird species across a broad range of vegetated habitat types and ensuring a sufficient number of observations for statistically robust analyses.
2.1.3. Predictor variables. The predictor variables were selected to represent a spectrum of habitat characteristics that could potentially influence the richness of bird species in any particular area. We separated these into three broad categories: physical environment (20 climate and topography variables), Table 2. Naming convention and brief description of the predictor variables and their source. Variables are organized according under broad categories including canopy structure, (other) vegetation properties, and physical environment. Elevation and all of the structure variables are in units of height (m) with the exception of COMP, VDR, Npeaks (unitless), and AreaBE, AreaCD (m 2 ). Units of temperature ( • C) and precipitation (mm) are used. VCF, Life Form, and Tree Leaf metrics are proportional (%). Biomass is (Mg ha −1 ) and NPP (gC m −2 yr −1 ). Temperature seasonality WorldClim bio5 Max temperature of warmest month WorldClim bio6 Min temperature of coldest month WorldClim bio7 Temperature annual range WorldClim bio8 Mean temperature of wettest quarter WorldClim bio9 Mean temperature of driest quarter WorldClim bio10 Mean temperature of warmest quarter WorldClim bio11 Mean temperature of coldest quarter WorldClim bio12 Annual precipitation WorldClim bio13 Precipitation of wettest month WorldClim bio14 Precipitation of driest month WorldClim bio15 Precipitation seasonality WorldClim bio16 Precipitation of wettest quarter WorldClim bio17 Precipitation of driest quarter WorldClim bio18 Precipitation of warmest quarter WorldClim bio19 Precipitation of coldest quarter WorldClim vegetation properties (15 variables), and canopy structure (9 variables). We considered structure metrics separately because they have been the subject of much interest in the context of three-dimensional habitat heterogeneity, as opposed to the two (horizontal) dimensions represented by the other variables, but also because they were derived from a satellite sensor that provides samples as opposed to wall-to-wall coverages (images). These variables are listed in table 2 and are described in more detail below. Because our focus was on physical environment and vegetation, we did not consider detailed predictors of human impact such as housing density, roads or impervious cover. . We used all variables, representing precipitation, temperature, annual bioclimatic trends (e.g., annual precipitation), seasonality (e.g., annual range in temperature) and extreme or limiting environmental factors (e.g., temperature of the coldest month) (table 2). Topographic information was derived from the Shuttle Radar Topography Mission (SRTM) collection 4 data set (Farr et al 2007). The spatial resolution of the SRTM data within the coterminous United States is 30 m. We note that because the SRTM data set is derived from interferometric radar imaging, it does not represent true ground surface topography for areas with vegetative cover but rather values that convolve canopy volumetric scattering with surface elevation. For our purposes these distinctions are not critical, particularly since vegetation properties (including canopy height) were independent predictors.
2.1.5. Vegetation property variables. Vegetation cover type information was derived from SYNMAP (Jung et al 2006), a global land cover map at 1 km spatial resolution that defines vegetation functional groups by life form class, tree leaf type, and tree leaf longevity (table 2). We calculated the proportional amounts of each cover variable, categorized as a particular vegetation functional group, within a 100 m buffer surrounding each BBS route (i.e. 200 m across).
MODIS Enhanced Vegetation Index (EVI) is a spectral vegetation index responsive to canopy reflectance, light absorption and photosynthetic capacity (Huete et al 2002). We calculated EVI from MODIS canopy reflectance products that have been corrected for variability in solar and viewing angles (Ju et al 2010). These data sets are 1 km resolution, as are the other MODIS products we used here, spanning the years on which we focused (2004)(2005)(2006). For our analyses, EVI was averaged throughout the growing season to reduce issues with cloud cover. Growing season start, end and duration were derived from the MODIS phenology products (Ganguly et al 2010).
The amount of tree canopy cover within each 500 m cell across the study domain was derived from the MODIS vegetation continuous fields (VCF) products (Hansen et al 2005) for the years 2003 to 2005. We used VCF collection 4 (C4), version 3, and averaged the values temporally over the 3 year period to match the BBS observations while incorporating interannual variability.
Aboveground live forest biomass of the United States and Alaska was based on a map produced by the United States Forest Service (USFS) that used a combination of FIA (Forest Inventory and Analysis) plot data and MODIS reflectance values at 250 m resolution (Blackard et al 2008).
We also used the MODIS-based net primary productivity (NPP) product, derived from a simplified light use efficiency model using other MODIS products including canopy light absorption and incident radiation (Running et al 2004). We used the collection 5.1 product covering the years 2004 to 2006, at a resolution of 1 km. We used MODIS NPP rather than gross primary production (GPP), as per earlier studies (e.g. Coops et al 2009, Phillips et al 2010), because the two variables were highly correlated (r = 0.9) and NPP was of more relevant to previous work on productivity-diversity relationships (e.g. Mittelbach 2010, Cusens et al 2012).
2.1.6. Canopy structure metrics. Satellite-based lidar data from GLAS were used to derive metrics for vegetation canopy structure. We used GLAS data from shortly after launch in 2003 through 2008. The GLAS laser has an ellipsoidal footprint of approximately 65 m, spaced about 172 m apart along the orbital track. We used the L1A Global Altimetry (GLA01) and the L2 Global Land Surface Altimetry (GLA14) data sets, from which we extracted vertical waveform data, signal beginning, signal end, noise metrics, and other variables. Waveform characteristics from each GLAS shot were analyzed and quality screened (appendix A) to derive a series of metrics that characterize the vertical distribution of canopy elements (leaves, branches, boles). These metrics are summarized in table 2, but some of the more relevant ones include vegetation canopy height (CH), the height of the median amount of energy returned (HOME), and a measure of how the observed waveform diverges from a Gaussian shape after reflectance from the surface (complexity). Specifics on the derivation of these metrics, based on airborne lidar data, are provided in Goetz et al (2010).

Geospatial data processing
To construct predictor variable data sets that describe the average conditions for each BBS route, we established a 200 m buffer around the routes (roughly equivalent to BBS observer detection limits) and spatially intersected them with each predictor variable (figure 2). The mean value of the predictor variables was then extracted and uniquely associated with each route. In the case of discrete predictor variables (e.g. the SYNMAP variables), each BBS route was ascribed the value of the percent of area within the surrounding buffer that was categorized as some particular vegetation functional group. As part of the derivation of the various lidar metrics, we performed extensive screening to remove shots that did not produce adequate waveforms for characterizing canopy structure (appendix A). We then extracted all shots located within the 200 m buffer and calculated the mean value of the various structure metrics for each route. Routes in which there were fewer than ten GLAS shots were excluded from further analysis to avoid inadequately capturing spatial variability.
2.2.1. Random Forests. In Random Forests, we used 500 trees, each built using bootstrapped samples of the data and a randomly selected set of input variables, with an internal unbiased estimate of the generalization error produced (see appendix A). We performed each of these analyses in the R language for statistical computing (R Development Core Team 2009). Only routes where data from each predictor variable was available were used for further analysis. The performance of the resulting ensemble of trees was crossvalidated using 20% of the data set randomly selected and reserved for this purpose. The reserved data were used to calculate standard coefficient of determination (R 2 ) values denoting variance explained by the predictions. Variable importance, as measured by the percent increase in mean square error, was calculated and displayed as partial dependency plots for each predictor variable. Each plot graphically depicts how bird richness (on the y-axis) would be expected to respond to changes from each predictor variable (on the x-axis), independent of all other variables. The Random Forests models for each guild, and in aggregate, were then used to generate maps across the US at 1 km spatial resolution.

Comparisons with USDA Forest Service bird distribution
maps. To independently assess the validity of our predictions of species richness spatial distribution patterns, we compared our 1 km resolution maps with those from Matthews et al (2011) 20 km resolution maps (henceforth referred to as the USFS data set). They evaluated 147 breeding bird species across the eastern United States (Matthews et al 2007, 2011). We selected bird species they modeled with 'high reliability', including 53 species with habitat types that matched the breeding habitat guilds considered in our analysis. We note that Matthews et al (2011) only considered BBS data east of the 100th meridian from 1981 to 1990, whereas the BBS data we used span the entire contiguous US from 2004 to 2006 (inclusive).
The spatial data for each bird species in our maps were reclassified from predicted incidence values (the proportion of routes with species present on a scale from 0 to 1) to presence/absence cells. Species richness values (i.e. the number of bird species per cell) were generated for each guild and results were resampled to 20 km resolution to match the cell size and spatial extent of the USFS data set. Relative species richness was calculated for both data sets, multiplying each by 100 and dividing by the maximum species richness value present. We also compared the predicted habitat center for each guild, i.e. the latitude and longitude of the geographic center of each distribution, weighted by species richness. Lastly, we explored spatial correlations between the mapped data sets, showing where they agreed best and least. More specifics on the methodological approach are described in appendix A.

Predictor variable correlations
The predictor variables most strongly correlated with bird species richness varied by guild type (figure 3). For total species richness vegetation property variables generally had the greatest predictive capability, followed by physical environment (e.g. temperature, precipitation) and canopy structure (see also appendix figure C.1). By guild, vegetation property variables typically had the highest correlations with species richness, with the exception of scrub birds (figure D.1). Considering environmental variables alone (see table 2), those most highly correlated with bird richness were also distinct for each guild. Temperature variables were most strongly correlated with forest bird richness, while precipitation variables were the strongest environmental correlates of open woodland bird richness (figure 3).
The correlation between vegetation properties, structure and bird richness were generally positive for the forest bird guild and negative for non-forest guilds (e.g. grassland). Conversely, the percent of grassland surrounding the route (lfgrass) showed a strong positive correlation for grassland bird richness whereas there was a negative correlation for forest birds (figure 3). Forest bird richness has the strongest correlations with the percent tree cover and also with EVI, percent area with mixed leaf longevity, and percent area with mixed leaf type. Open woodland bird richness was most correlated with EVI, percent area covered by deciduous trees, and percent area covered by broadleaf trees (figure 3). Scrub bird richness was most strongly correlated with bioclimatic variables rather than vegetation properties, although this result was probably influenced by the small range of scrub bird richness (between 0 and 11 species) in the BBS data set.

Prediction models
A combination of vegetation property and physical environment (i.e. climate variables) variables were the most influential predictors of species richness (figure 4). Bird species richness was predicted only moderately well for all guilds (R 2 = 0.34) but the predictive models were strong for forest birds (R 2 = 0.85) and moderately strong for the other guilds (R 2 = 0.60-0.75). The most important variables for predicting forest bird richness were VCF and percent of BBS route area forested (LF tree). Grassland bird species richness was also well predicted by VCF and percent area forested, based primarily on strong inverse correlations of these predictions with richness. Open woodland species richness was most dependent on climatic variables, but also on biomass and VCF. Scrub bird models were the least robust, and only climate (primarily temperature) variables were important predictors.
Further partitioning the data set for forest birds and rerunning random forest models for only those routes with high productivity (NPP > 550 gC m −2 yr −1 ) and high vegetation canopy cover (VCF > 50%) (figure E.1) allowed us to assess which predictor variables were most important for conditions that are more characteristic of densely vegetated, high productivity environments (figure 5). While climate variables were the most important predictors of forest bird richness in these environments, canopy structure metrics were also correlated (figure F.1). Compared to the entire set of forest birds, the species richness values were higher and restricted to a somewhat smaller range, but the cross-validated predictive models were still robust (R 2 = 0.75 for both) (figures 5(a), (e)). The partial dependency plots in figure 5 show how predictor variables co-vary with the response variable (species richness) across their respective ranges. Notably, forest bird species richness in high productivity areas increased with biomass and canopy vertical complexity, particularly above the 20th percentile of both variables, but declined above the ∼20th percentile for all the selected climate variables. Similar results were observed for species richness in areas with high canopy cover (figure 5 lower panels), with declining richness above the ∼20th percentile for all climate variables and particularly rapid increases in richness up to the 40th percentile of biomass and the 50th percentile of vertical complexity values.
Additional partial dependency plots for all the guilds are presented in figure G.1. Notable trends include the systematic increase in forest bird richness with each of the most important predictor variables, the converse relationship for grassland birds (richness declines with increases in about half of the key predictors), and distinctly hump-shaped relationships for both open woodland and scrub birds (with highest species richness at mid-range values of the predictors). These observations were particularly evident for the canopy structure variables (right set of panels in figure G.1).

Bird species distributions and comparisons with USDA Forest Service maps
Maps of species richness derived from the BBS-route-based models (figure 6 right), and the continuous maps derived from the models being applied to the stack of spatial data layers (figure 6 left), both show coherent spatial patterns and strong regional gradients that vary by guild type. The route-based models and the continuous predictions show similar spatial patterns of richness, although the continuous maps reveal considerable spatial variability across the domain, with the highest total richness levels in the eastern US, particularly at higher latitudes. This pattern is even more pronounced for forest birds. High richness of open woodland birds spanned much of the eastern US as well, but was more widely distributed across the region. Grassland species richness, as expected, was particularly high in the west-central US We did not create maps of scrub birds owing to the less robust models for that guild.
Our predicted patterns of relative species richness correlate reasonably well with those of Matthews et al (2011). Overall, the weighted mean center of the species richness Figure 6. Maps of species richness for all birds (top) and for three guild types (forest, woodland, grassland birds). The maps on the left represent the modeled distribution of species richness continuously across the study domain, each developed from the route-level models shown on the right and the suite of geospatial data layers (table 2). spatial distributions (see section 2.2.2) for our study and Matthews et al (2011) was geographically close for each guild type (figure H.1). They were closest for the open woodland guild (9.3 km) and farthest for the grassland guild (179.3 km). There was, however, considerable variability in the spatial correlation patterns. In both, forest bird richness was greatest around the Great Lakes, New England and the Appalachian Mountains. The spatial pattern comparison was least similar for grassland birds, with predictions from the USFS data set being high in the mid-western states and those from ours being concentrated further west. Both maps predicted the highest relative richness of open woodland species in the central eastern US, but the USFS data set predicted a larger expanse of high values.

Discussion
Our results show the various data sets (table 2) used in our analyses can inform models that robustly predict the spatial patterns of bird species richness across large areas, in this case the conterminous USA. While previous studies report similar findings at national (Coops et al 2009, Culbert et al 2013, Pidgeon et al 2007, Rittenhouse et al 2012 and regional scales (Allen andO'Connor 2000, Donovan andFlather 2002), we anticipated the influence of canopy structure captured in the lidar data sets would have greater predictive capability than we observed. This expectation was based on increasing evidence, using lidar-derived canopy structure measurements, that structure is an important predictor of species richness at local to regional scales (Bradbury et al 2005, Clawges et al 2008, Goetz et al 2010, Hill et al 2004, Lesak et al 2011, Seavy et al 2009, Swatantran et al 2012. Instead, our national scale estimates of forest bird guild species richness were most strongly influenced by tree canopy cover (i.e. VCF) and tree life form distribution (LFtree). Interestingly, we show robust predictions of forest bird richness even in areas of high productivity and high canopy cover, where we might expect some predictor variables to saturate in their sensitivity to increasing species richness, and canopy structure to become an important determinant of niche diversity. In these areas of high productivity and canopy cover, however, climate variables became important predictors (figure 5), indicating they capture additional information beyond even integrative variables like net primary production. In the case of woodland birds, a unimodal shape of the partial dependency plots reflects greatest species richness at intermediate vegetation cover, biomass and canopy structure values. This finding is consistent with greater landscape habitat diversity in more open woodland areas relative to, e.g. densely forested or open grasslands areas, and expectations of different woodland bird species utilizing differing vegetation types and densities.
The predicted patterns of relative species richness from our models were generally similar to those Matthews et al (2011) mapped for the eastern US, even though very different approaches were used to generate the maps produced by both studies. There are several differences worth noting. First, although both studies considered climate, elevation, and tree species properties as predictor variables, the other predictors differed substantially between the two studies. In particular, our predictions considered unique vegetation properties (e.g. biomass, productivity) and canopy structure metrics (e.g. canopy height, vertical complexity). We note these were aggregated to coarser resolution (20 km) for comparison with the Matthews et al (2011) maps. In addition, the BBS data sets used in this study and the USDA Forest Service maps were collected more than a decade apart. We expected this to adversely impact the comparison insofar as the time lag might contribute to changes in bird species richness and/or the predictor variables, but the overall similarities between the relative richness maps was an encouraging confirmation of the utility of our higher resolution predicted richness distribution maps ( figure 6). This was particularly true given we used only what Matthews et al (2011) deemed their highest quality species predictions (appendix B), and in the context of the strength of our BBS route-based Random Forests models ( figure 4).
These points noted, the strength of any predictor variable is fundamentally related to the scale at which the metric is most ecologically relevant. Our models tended to characterize forest vegetation type, cover and climate as the most important large area predictors of bird species richness, despite the considerable influence of canopy structure at more local scales. However, the inability of sample sets of 70 m GLAS lidar shots to fully characterize entire 40 km BBS routes complicates these interpretations. Canopy structure can undergo significant changes over the length of the route, and averaging all shots intersecting a route probably does not represent conditions of the entire route. A more direct analysis of bird observations within the footprint of the GLAS shots would better capture the attributes of bird species richness that are dependent on canopy structure and related habitat heterogeneity, but the BBS data were not collected with that intent (i.e. data are aggregated for all stops along the length of the routes). This does not mean that very large-scale studies of habitat structure with lidar are impractical, but it is important to note that GLAS lidar are samples (not wall-to-wall images). Despite that, we found meaningful correlations between GLAS structure metrics and bird species richness when compared in a pairwise manner (figures E.1, F.1). Moreover, the partial dependency plots (figure G.1) show reasonable and expected associations of structure metrics with forest birds (positively co-varying), woodland birds (richness peaking at intermediate values of structure metrics) and even grassland birds (negatively co-varying). Previous work with airborne lidar shows more densely sampled lidar structure metrics are useful for predicting not only richness but habitat use, thus future studies of the relative importance of species richness predictors might explore the effect of spatial averaging of structural variability that accompanies increasing lidar footprint size, as well as sampling density and terrain-influenced height estimation errors in medium to large footprint lidar systems. Such studies would be best designed in the context of potential next generation space-borne lidar systems relevant to vegetation and ecosystem studies.  specifically shots where the amplitude of the raw waveform does not exceed twice the noise threshold. We also excluded shots in which the waveform contained less than two distinct peaks (an indication that the top of the vegetation canopy was not detected), as well as shots in areas with substantial topographic relief, where canopy height estimation was problematic (Lefsky et al 2007). A total of 48,133 GLAS shots remained post screening, which was less than half of the total number of shots acquired and processed.
Random Forests models. We subsequently applied the lidar (i.e. structural information), physical environment, and vegetation properties metrics in Random Forests to assess predictor variable importance. In Random Forests, data are partitioned using a series of hierarchical binary splits on the predictor variables, with the goal of maximizing explained variance in the response variable (in our case bird species richness). A different bootstrapped random sample of the data for each tree is used to optimally split among the various predictors, improving performance by iteratively aggregating the results. Predictors are selected randomly, which acts to increase the accuracy of the model while reducing the influence of variable selection order and, in the process, minimize overfitting (Cutler et al 2007). The combined effects of the predictor variables were assessed using the mean prediction of many individual regression trees.
Generation and comparison of relative richness maps. Species presence/absence thresholds were optimized by Matthews et al (2011), who found an optimal incidence value (IV) cutoff at 0.05 (i.e. grid cells with IV < 0.05 were reclassified as zero, indicating species absence). We subsequently explored the regional correlations between our model predictions (   correlation, which we created by writing a moving window script that derived a regression model for the cell values (from both data sets) within the same 5 × 5-cell window and created a new map with the correlation coefficient in the window centroid. The window iterated across both data sets, producing maps of the spatial correlation between the data sets. The moving window assigned null values in regions where identical richness values (in either or both data sets) populated a 5 × 5 window because the standard deviation in these cases was zero. Relative species richness was calculated for both data sets, multiplying each by 100 and dividing by the maximum species richness value present. We also compared the predicted habitat center for each guild by calculating the mean center of spatial distribution, weighted by species richness (using the Mean Center tool). , and the correlation between the two data sets (right images). Green dots in the relative richness maps indicate the mean geographic center of distribution (MCD) for all birds and each guild type. White areas within the eastern US in the correlation maps indicate null value regions (see appendix A).