20 cm resolution mapping of tundra vegetation communities provides an ecological baseline for important research areas in a changing Arctic environment

Arctic tundra vegetation communities are spatially heterogeneous and may vary dramatically from one meter to the next. Consequently, representing Arctic tundra vegetation communities accurately requires very high resolution raster maps (<5 m grid cell size). However, using remotely sensed data to produce maps with sufficient spatial detail at an extent appropriate for understanding landscape-scale ecological patterns is challenging. In this study, we used predictor layers derived from airborne lidar and high-resolution (∼5 cm) 4-band airborne imagery to classify vegetation communities at 20 cm spatial resolution for three landscapes (12.5 km2 total) near the Toolik Lake research area in the Alaskan Low Arctic. The maps were built using a Random Forest model that was trained and tested on 800 ground reference plots, using classes derived from commonly used legends on existing polygon maps of the area. Withheld test plots (25% of dataset) had a balanced map accuracy of 0.57, kappa of 0.47, and weighted (fuzzy) kappa of 0.65. These maps provide high-resolution plant community information that can serve as important baseline reference data for vegetation monitoring and change detection in this rapidly changing tundra ecosystem, and as validation for coarser scale maps. They also permit fine-scale characterization of landscape phenomena such as community-level nutrient dynamics and wildlife habitat suitability in an important Arctic research site. Our approach demonstrates that very high resolution mapping results can be achieved and validated by integrating high-resolution remote-sensing datasets from multiple sensors in a machine learning model trained on simple field reference data.


Introduction
As the Arctic continues to warm rapidly (ACIA 2005), research efforts seeking to improve understanding of potential impacts on human, animal, and vegetation communities are becoming increasingly urgent. To address these challenges, there is often a need to understand the distribution of features across the landscape-for example, to understand how much area is covered by lakes and wetlands that may release increasing quantities of methane (Walter et al 2007), or which permafrost slopes might soon thaw and erode, impacting local and downstream terrestrial, aquatic, and human communities and releasing ancient soil carbon to the atmosphere (Osterkamp 2005).
The spatial requirements for mapping these landscape features in a scientifically relevant way depend on the ecological change process under study. Coarse to moderate resolution classification maps covering large extents provide valuable information about general surface features (e.g., Walker et al 2005, Beck et al 2011, and can be important inputs to regional and global climate models (Oechel et al 2000). However, the low spatial resolution and usually coarse classification schemes of these mainly satellite-derived maps cannot represent the heterogeneity of Arctic landscapes accurately enough to reproduce the distribution of diverse vegetation community types or fine-scale ecological phenomena. Capturing the natural variability of Arctic tundra surfaces across complex environmental gradients requires surface mapping at very high spatial resolution, generally 5 m or less (Atkinson and Treitz 2012, Davidson et al 2016, Langford et al 2016. For example, very high spatial resolution mapping is necessary for identification of suitable nesting habitat for migratory songbirds that require open, dry hillsides (Boelman et al 2014), and for determining whether individual patches of small shrubs growing along narrow waterways are expanding (Myers-Smith et al 2011). On the other hand, classification maps derived from unmanned aerial systems (UAS) may be of very high spatial resolution, but the spatial extent of such datasets may be restricted by short flight times and other logistical considerations (Whitehead and Hugenholtz 2014), limiting their general usefulness. High-quality piloted-airborne datasets therefore still represent a gold standard, with the potential for producing high-resolution data products at the moderately large spatial extents necessary for understanding landscape-level influences on fine scale vegetation change.
Fusing lidar and passive optical data produces a robust combination of structural and spectral information for a surface, and consequently this has proven to be a powerful approach to quantifying geospatial features (Asner et al 2007, Cook et al 2013. For example, Reese et al (2014) found that combining lidar and optical metrics improved classification of low-stature vegetation in an alpine zone over optical data alone, and Greaves et al (2016) similarly found that this approach produced strong models for quantification of shrub biomass in Arctic tundra.
Our objective was to fuse high-quality lidar and optical datasets to produce tundra vegetation community maps with the high spatial resolution and fine classification granularity necessary to be relevant for the study of fine-scale ecological features at a landscape extent. We used predictor layers derived from airborne lidar and multispectral imagery along with simple field reference data to train a Random Forest model to classify vegetation communities at 20 cm resolution across three research areas near Toolik Lake (12.5 km 2 total), an area of important long-term and ongoing Arctic ecological research.

Study area
The study was conducted near the Arctic Long Term Ecological Research site at Toolik Lake (Alaska, USA, 68°38′ N latitude, 149°36′ W longitude; figure 1). Located on continuous permafrost tundra in the northern foothills of the Brooks Range, the Toolik Lake area has been an important site for North American Low Arctic research since the 1970s. The region is characterized by cold winters (mean coldest month temperature −30°C) and mild summers (mean warmest month temperature 13°C), with roughly 300 mm of precipitation annually (Huryn and Hobbie 2012). Complex topography, glacial features, and prevailing weather patterns (notably scouring southerly winds) generate a complex mosaic of microenvironmental gradients driven by parent material, exposure, and drainage, which in turn host a finely heterogeneous patchwork of low-stature tundra vegetation communities. The dominant vegetation is graminoid tundra (especially Eriophorum vaginatum and Carex bigelowii) with abundant Sphagnum and other mosses and erect deciduous shrubs, mainly Betula nana and Salix species.
2.2. Data collection 2.2.1. Airborne data collection Airborne lidar and high-resolution digital imagery were collected August 1st, 2013. Data were collected in three discrete footprints totaling 12.5 km 2 (figure 1; Vierling et al 2013aVierling et al , 2013b. The largest and westernmost footprint (hereafter 'Toolik', ∼6 km 2 ) covers research areas near Toolik Field Station at Toolik Lake. The other two footprints are each roughly 3 km 2 : the 'Pipeline' footprint (∼3 km east of Toolik Lake) follows a stretch of the Trans-Alaska Pipeline from a high moraine ridge down a sloping drainage toward the Kuparuk River; the 'Imnavait' footprint includes research areas along Imnavait Creek, roughly 10 km east of Toolik Lake. Lidar collection specifications are shown in table 1. The vendor performed pre-processing of the lidar data, including aircraft attitude adjustments and ground control georectification. Simultaneous 4-band digital imagery (RGB and near infrared, hereafter 'RGBN') was collected by a Leica RCD30 60MP camera mounted to the aircraft. The imagery was orthorectified by the vendor and delivered as orthoimagery mosaics with a pixel resolution of roughly 4.25 cm. A GPS check of 40 stationary objects distributed across the Toolik footprint suggested that the imagery was accurately orthorectified to within ∼10 cm.

Reference plot photo collection
Field reference data were collected from across all footprints in summer 2014 and summer 2015 (figure 1). The majority of plot locations (n=592) were established in advance using a stratified random scheme to ensure inclusion of wide ranges of slope, aspect, elevation, and glacial history across all footprints. An additional 47 plots were collected in the course of visiting the stratified random plots, to capture a wider range of shrub characteristics, and 92 more plots were collected in 2015 to represent additional shrub types and vegetation across topographic transects. For each of these 731 total field plots, we took nadir photographs from approximately 1 m height using a standard point-and-shoot consumer digital camera, capturing roughly 1×1.3 m of ground area. We also recorded brief notes at each plot describing plant species and surface features present in the plot and the general area, including vegetation height measured using a meter stick; thaw depth measured using a steel thaw probe; and moss+organic soil depth, measured using a meter stick after slicing a slit in the moss/organic mat with a serrated knife. The center coordinates of each plot photo were precisely measured using a TopCon GR-3 survey grade GNSS system operating in real-time kinematic (RTK) mode (nominal accuracy 4 cm).

Reference plot classification
This area has benefited from previous vegetation classification efforts, in particular by the Alaska Geobotany Center at the University of Alaska, Fairbanks, which has produced a hierarchical series of maps for the area depicting multiple landscape features. These include vegetation maps (http://arcticatlas.org) that provided a  valuable benchmark for this project. Vegetation classes for the Alaska Geobotany Center maps were established using the Braun-Blanquet classification approach (Braun-Blanquet 1965, Walker et al 1994, in which species associations are grouped into a hierarchy of successively broader units. The classification system has the potential to precisely describe the finely heterogeneous vegetation community distribution of the Toolik Lake area, but the maps have limited or absent map accuracy information, and are polygon-based rather than raster, with varying minimum mapping units (i.e., resolution) that necessarily simplify the distribution and arrangement of vegetation patches on the landscape (small-extent maps have the smallest minimum mapping units, 2.5 m 2 ). Since it was desirable that the final map classes in this project correspond as well as possible with legends used on previous maps, preliminary vegetation classes for this project were established by examining communities described in available maps and literature, especially Walker and Barry (1991), Walker et al (1994), Walker and Walker (1996), Raynolds et al (2005), and the related hierarchy of maps, particularly Walker and Maier (2007). To assign each plot photo (i.e., each plot location) to a plant community, photo order was randomized, and each photo was visually examined. Based on plant species and other features visible in the photo, as well as plot notes, each plot location was assigned to the class that best described the center of the photo, where GNSS coordinates had been measured. Because it wasn't yet clear how well we would be able to distinguish vegetation classes in the final model, initial class assignments were as granular as possible, to permit more flexible subsequent grouping. Finally, 69 additional reference plots locations were selected from the orthophotos, to provide more reference data for gravel pads, blockfields (rock), and open water-classes that were underrepresented in the field data but could be confidently identified in the orthophotos.

Input data layer processing
A total of 33 predictor variable layers at 20 cm resolution were derived from the airborne lidar and RGBN imagery using a variety of tools (table 2). Most layers fall into three general categories: lidar-derived canopy metrics (e.g., sum and standard deviation of canopy heights), lidar-derived topographic metrics (e.g., topographic position index, SAGA wetness index), and first-order imagery metrics (e.g., NDVI, percent red, standard deviation of hue). Additionally, there is one compound layer, shrub biomass, which was derived previously using lidar canopy metrics and NDVI (Greaves et al 2016(Greaves et al , 2018 and downscaled to 20 cm resolution for use here. We also included lidar return intensity (pulse amplitude), which acts as a surface metric (Chust et al 2008, Lang andMcCarty 2009). The raw intensity of near-infrared (1550 nm) laser returns is affected by surface characteristics, especially wetness (water absorbs the laser pulse, leading to low return intensity in wet areas), surface roughness (relatively smooth, dry areas may reflect the pulse more strongly), and the presence of green vegetation (which strongly reflects infrared radiation) making it a potentially valuable predictor of surface features and vegetation distribution . Although our intensity data are not radiometrically calibrated or normalized (Yan and Shaker 2014), the intensity response showed few flight-line artifacts, and the nonparametric nature of the Random Forest modeling approach permits the inclusion of raw, uncorrected data. All predictor metrics were selected to be ecologically meaningful for Arctic vegetation distribution, as well as being likely to improve prediction of vegetation characteristics based on previous work (e.g., Ohmann and Gregory, 2002, Dirnböck et al 2003, Naito and Cairns, 2011, Kushida et al 2015, Greaves et al 2016. Owing to the prevalence of noise in data layers with such high resolution, most layers were smoothed using a circular mean Focal Statistics filter (Spatial Analyst in ArcGIS, ESRI, Redlands CA). This type of smoothing retains the original spatial resolution, but tends to improve classification predictions (e.g., Gottfried et al 2014) by decreasing the impact of meaningless local variation, instrument noise, and small shadows. We initially processed layers at several smoothing distances (0.5 m, 0.75 m, 1 m, 1.25 m), but we only retained layers that appeared important in preliminary models and had a Spearman correlation less than 0.95 at the reference plot locations. Once the layers were prepared, the values of all predictor layers at the location of each reference plot were extracted for model training and validation.

Random forest modeling
Random Forest (Breiman 2001) is a machine learning algorithm that uses ensembles ('forests') of classification or regression trees. In a Random Forest, each tree selects and permutes random subsets of predictor variables at each splitting node, which reduces overfitting and improves the strength of predictions. Additionally, Random Forest is non-parametric, models local interactions naturally, and is relatively robust against the collinearity common to ecological variables ( We implemented Random Forest using the package 'randomForest' (Liaw and Wiener 2002) along with model training tools in the package 'caret' (Kuhn 2016) in the R statistical language (R Core Team 2017). Twenty-five percent of the reference data plots (selected proportionally from each class; n=196 plots total) were withheld from model building to be used as a test dataset; the remaining 75% of the plots (n=604) were used to build the model. One model was built to predict all three footprints.
Because the Random Forest algorithm strives to produce the best overall accuracy, it is susceptible to producing poor accuracy for small classes in imbalanced datasets. Since our input dataset was imbalanced, we used an upsampling algorithm implemented in the caret package to ensure equal weighting of each class (Chen et al 2004). The model building was an iterative process: we used out-of-bag error rates and confusion matrices from early models to determine which classes could not be differentiated, then collapsed input classes accordingly (table S1 is available online at stacks.iop.org/ERC/1/105004/mmedia) and retrained the model, seeking a balance between accuracy and retention of informative classes. The final class names and reference plot membership counts are in table 3. We also used tools in the caret package to tune the number of trees and the number of variables selected for splitting at each tree node (mtry) based on 5-fold cross-validation repeated ten times. We found that any number of trees greater than 500 and any value of mtry less than approximately 6 gave largely similar results (data not shown). The final model used 1001 trees and an mtry of 1.

Mapping
After building the final Random Forest model, we applied it across the three data collection footprints using the AsciiGridPredict tool in the R package 'yaImpute' (Crookston and Finley 2007). Once the initial maps were built, pixels in class patches smaller than 25 pixels (i.e., 1 m 2 ) were replaced by the surrounding majority class; therefore, while the map spatial resolution is 20 cm, the minimum mapping area is 1 m 2 . Queen-related pixels (i.e., pixels touching only at one corner) were considered to be connected to each other, so 1 m 2 patches may not appear as cohesive blocks. Minor manual cleanup was performed to correct pixels where infrastructure had been classified as vegetation on the Toolik Field Station gravel pad. Some misclassifications of other infrastructure on the landscape, such as boardwalks, remain uncorrected but are apparent to the human eye.
To evaluate the accuracy of the map, we extracted the final mapped class at the location of each withheld test plot. From these mapped locations, we determined the overall accuracy (i.e., the proportion correct), balanced accuracy (i.e., the mean class-wise accuracy), Cohen's kappa, and Cohen's weighted (fuzzy) kappa (Cohen 1968). Like Cohen's kappa, Cohen's weighted kappa quantifies the overall agreement after accounting for agreement occurring by chance, but it also permits the investigator to subjectively indicate the degree of disagreement present (table S2). Given that vegetation classes are discrete labels that represent gradients of species abundance and structure, and given that our class labels were highly granular, it is reasonable to treat some disagreements as more or less serious in this way. For example, in most applications it would be considerably less serious to confuse Tussock Tundra with Shrubby Tussock Tundra than to confuse it with Tall Shrub. Detailed descriptions of the classes are in table S3.

Results and discussion
3.1. Model and map accuracy Random Forest upsampled out-of-bag overall accuracy was 0.86, with a kappa of 0.85 and weighted kappa of 0.91 (data not shown). Based on extraction of final mapped classes for withheld test plots, overall map accuracy was 0.52, and balanced (mean class-wise) map accuracy was 0.57 (table 4). Kappa was 0.47 for the mapped withheld plots, indicating moderate agreement. Weighted kappa was considerably higher (0.65), suggesting that many disagreements were relatively minor in an ecological sense. The confusion matrix (table 4) shows that the main confusion was among shrubby classes (e.g., Shrubby Moist Non-Tussock Tundra and Low to Tall Moist Shrub) and among moist upland classes (e.g., Tussock Tundra and Moist Non-Tussock Tundra), which was expected because these classes grade into each other on the landscape. Mean user's accuracy was slightly higher than mean producer's accuracy (0.63 versus 0.57). The confusion matrix also illustrates that although the withheld test data represented 25% of all reference data, the imbalanced class sizes led to very small withheld sample sizes for small classes (as small as 4 members, mean 13); this makes the accuracy metrics more difficult to interpret, since there were relatively few opportunities to test accuracy in these smaller classes.

Sources of error and challenges of high-resolution mapping
The logistics of mapping at 20 cm resolution exacerbates uncertainty, especially in the context of such a finely heterogeneous landscape. At such a high resolution, unavoidable sub-meter errors in data collection and processing become important, but remain difficult to quantify. These errors might include small artifacts from imagery stitching and orthorectification; minor spatial variations in lidar ground-control adjustments; centimeter-level errors in GNSS base station location or RTK corrections; and slight mismatches in the location of a reference plot photo center and the measurement of its GNSS coordinates-any of which could displace a reference plot into the wrong pixel and introduce noise into the modeling process. Indeed, the large difference between the value for kappa (0.47) and weighted kappa (0.65) suggests that many misclassifications are attributable to small ecological (gradient-related) offsets or locational errors in the data. Nonetheless, we believe that the high resolution of the maps, their relatively large extent, and the granularity of the classes offset the moderate map accuracy we were able to achieve, especially given the challenges inherent in any vegetation classification. For example, the gradient nature of vegetation communities can make class assignment for a plot ambiguous, since a given location may show characteristics of multiple vegetation communities. This might lead to two similar training plots being assigned to different classes based on an arbitrary distinction, introducing noise to the training data. The granularity of our classification scheme also increases opportunities for confusion. For example, there were five shrubby classes, many of which are structurally similar, as well as two common classes (Tussock Tundra and Moist Non-Tussock Tundra) that each had a corresponding 'Shrubby' class. These 'Shrubby' versions would only be differentiable from their nonshrubby counterparts based on the presence of sparse, low-stature shrubs (<30 cm height), which are difficult to detect in remotely sensed data. Nonetheless, we retained the granular classes because we believe the maps accurately reflect landscape vegetation patterns, and that this classification will be most valuable to map users.

Predictor importance
Although the Random Forest approach is valuable for its flexibility and robustness, it provides only limited options for ecological interpretation of results. Variable importance metrics are derived by examining how much the accuracy (both class-wise and overall) changes when the model is run iteratively with the values of each predictor permuted (i.e., randomly reordered among plots). However, variable importance metrics become less meaningful when input variables are highly correlated (Nicodemus et al 2010, Kuhn and Johnson 2013, Raschka and Mirjalili 2017, as they are in this study, so it is not advisable to interpret the variable importance values rigorously in this case. With that caveat, some results are interesting (table S4). For example, lidar intensity was important to accuracy for several classes, with standard deviation of intensity especially important for very wet vegetated classes (Raised Areas in Wet Tundra, Wet Tundra), where the occurrence of intermixed high-and low-intensity pixels could indicate areas of intermixed vegetation and saturated microsites. Intensity smoothed at 1 m was the predictor with the largest impact on the overall Gini purity index (not shown) and was especially important for very dry vegetated classes (Dry Exposed Tundra, Dry Sheltered Tundra) where intensity would be high, and for Water, where intensity would be low. This adds to emerging evidence that lidar intensity can be an important ecological metric .
Greenness indices (NDVI, DSI, 2GRBI) were important for predicting all shrubby classes, as would be expected, and coarse-scale topographic position was important for predicting Low to Tall Moist Shrub, which generally occurs in low-lying streambed areas. Unsurprisingly, lidar-derived canopy height metrics were important for predicting Tall Shrub, and heat load index was important for predicting Moist Snowbed-but it was somewhat more surprising that percent red appeared important for predicting wet tundra classes (Wet Tundra, Raised Areas in Wet Tundra) and moist upland tundra classes (Moist snowbed, Tussock Tundra, Moist Non-Tussock Tundra), perhaps owing to a comparatively high degree of red absorption by both water and plants in those areas. The standard deviation of the panchromatic (R+G+B), although not at all important for most classes, was highly important for predicting Tussock Tundra, where high values may have indicated the distinctive speckled image texture created by tussock shadows. Indeed, in our early modeling attempts, which  employed 1 m spatial resolution rather than 20 cm, we found that models could not distinguish tussock tundra from non-tussock tundra, perhaps owing to dilution of this signal when averaged over a larger area.
Overall, in a list of 15 most important predictors per class, all 33 predictors appear for at least one class (table S4); in a list of the five most important variables per class, only seven predictors fail to appear for at least one class (hue_sd, swi_05, swi_1, ws_05, ws_1, tpi_25, and tpi_50). Although this may be due in part to the correlated nature of many of the predictor variables, the wide variety of variables appearing as important for different classes also suggests the complicated interrelationships between vegetation communities and their ecological drivers.

Mapping
The final maps are 4-bit unsigned integer 20 cm resolution rasters. Figure 2 shows the full extent of the Toolik map; figure 3 shows a detail from an area within the larger Toolik footprint. Full extents of the Pipeline and Imnavait maps are shown in figures 4 and 5, respectively. Details from the Pipeline and Imnavait maps are included in the supplemental material (figures S1-S2). Minor flight-line artifacts are visible in a few areas, most notably in the southern portion of the Imnavait footprint; these artifacts are likely the result of sun-sensor geometry interacting with the vegetation canopy and the slope of the terrain, potentially along with slight inconsistencies in cloud cover between flight overpasses. The effect appears to cause some tussock tundra areas to be classified as Shrubby Moist Non-Tussock Tundra, presumably because the effect increased the apparent

Conclusion
This project combined predictor datasets derived from two airborne sensors in a machine learning approach that permitted the construction of very high-resolution validated vegetation classification maps based on simple ground reference data. The approach highlights the ability of high-quality airborne data to produce very highresolution geospatial data products over relatively large areas, providing an important link between limitedextent ground-and UAS-based remote sensing products and coarser geospatial data products covering larger extents. The maps should prove to be a valuable resource for Arctic researchers requiring fine-scale reference data for these important research landscapes.

Acknowledgments
This work was supported by NASA Terrestrial Ecology grant NNX12AK83G awarded to LAV, NASA Earth Science Fellowship NNX15AP04H awarded to HEG, and NASA Idaho Space Grant Fellowship NNX10AM75H awarded to TSM. This is a NASA ABoVE affiliated project. Airborne lidar and imagery data were collected and preprocessed by Kodiak Mapping, Inc., Palmer, AK, www.kodiakmapping.com. Field assistance from Elizabeth Fortin was invaluable, and Skip Walker generously provided feedback and advice on the maps. Random Forest modeling was informed and improved by discussions with Patrick Fekety. The manuscript was improved by comments from two anonymous reviewers. The authors are also grateful for the support of the staff and greater research community of Toolik Field Station, Institute of Arctic Biology, University of Alaska Fairbanks, with special thanks to Jason Stuckey, Randy Fulweber, and Jorge Noguera of Toolik GIS for assistance with field GNSS and lidar ground control.