A customized framework for regional classification of conifers using automated feature extraction

Pinyon and juniper expansion into sagebrush ecosystems is one of the major challenges facing land managers in the Great Basin. Effective pinyon and juniper treatment requires maps that accurately and precisely depict tree location and degree of woodland development so managers can target restoration efforts for early stages of pinyon and juniper expansion. However, available remotely sensed layers that cover a regional spatial extent lack the spatial resolution or accuracy to meet this need. Accuracy can be improved using object-based image analysis methods such as automated feature extraction, which has proven successful in accurately classifying land cover at the site-level but to date has yet to be applied to regional extents due to time and computational limitations. Using Feature Analyst™, we implement our framework with 1-m2 reference imagery provided by National Agricultural Imagery Program to classify conifers across Nevada and northeastern California. Our resulting binary conifer map has an overall accuracy of 86%. We discuss the advantages to accuracy and precision our framework provides compared to other classification methods. ● This framework allows automated feature extraction for large quantities of data and very high spatial resolution imagery ● It leverages supervised learning ● It results in high accuracy maps for regional spatial extents


Regional mapping applications
Improved management of sagebrush ecosystems in the Great Basin requires high resolution, high accuracy maps of pinyon and juniper (hereafter, conifer) distribution and density (measured as percent canopy cover) to identify potential treatment areas [8] . However, contemporary remote sensing products lack the resolution to provide accurate conifer locations and to identify earlyphase woodland expansion ( Fig. 1 ), which are crucial in planning targeted conifer treatment [8] . We applied object-based image analysis (OBIA) to map conifers at a resolution of 1-m 2 using National Agriculture Imagery Program (NAIP; [15] ) imagery collected in 2010 and 2013 as our reference data and the Feature Analyst TM toolbox [13] for Esri® ArcGIS TM Desktop ( [5] , Release 10.2, Redlands, California). Feature Analyst TM is an automated feature extraction (AFE) method that semi-automates the extraction of target features using a machine learning algorithm trained to delineate image objects based on the spectral and spatial signatures of defined cell neighborhoods [12] . AFE outperforms pixel-based methods [17] and is recognized as one of the most accurate OBIA methods available [12] . However, the user investment and computational requirements of AFE has restricted its use in regional mapping applications. We present a new framework for conducting broad-scale feature extraction using very high spatial resolution (VHSR) imagery that preserves the benefits of high user involvement algorithms, like improved classification, while optimizing efficiency with batch geospatial processing. We extracted conifer (mostly pinyon-juniper; however, we could not differentiate among species) image objects to create 1-m 2 resolution binary conifer rasters across our study extent. We assessed mapping accuracy by analyzing errors of omission and commission using reference imagery and calculating overall accuracy for our mapping product. Finally, we qualitatively discuss how our AFE-based results compare with those derived recently using other techniques for high resolution mapping of conifers in rangeland ecosystems [6] . The resulting classification products are presented by Gustafson et al. [8] with accompanying descriptions of their significance to management applications.

Study area
Conifer mapping was conducted for all 61 Nevada Department of Wildlife sage-grouse Population Management Units (PMU; Fig. 2 ), as pinyon-juniper treatment is significantly motivated by sagegrouse habitat restoration [3] . We used the NAIP digital orthophoto quarter quads (DOQQs) because they are freely available, orthorectified, VHSR ( < 4-m 2 ) products that comprise four spectral bands (three visible-light bands [RGB] and a near-infrared band), which allow classification of vegetation types, including conifers. Our study extent included 6,230 DOQQs from Nevada, and California along with small areas along the state borders of Oregon, Idaho and Utah that fell within the buffered PMU boundary. We buffered the extent of the PMUs by 10 kilometers (km) to prevent inaccurate moving window (or neighborhood) calculations within the study area along boundaries where "No Data" values would occur. We selected this size of buffer because sage-grouse typically do not use habitat more than 8 km from lek locations [2,9] . A small section of our study area along the southern boundary was truncated by the Nevada Military Test and Training Range where the NAIP imagery was unavailable or redacted. We analyzed PMUs on a tile-by-tile basis by intersecting polygon boundaries of the DOQQs with the PMUs because inconsistencies among DOQQs (e.g., varying image quality, changes in lighting, inconsistent spectral values, shadows, parallax, and processing artifacts) required independent analysis of each tile for greatest classification accuracy. We further divided larger PMUs into smaller, more manageable zones. Zone boundaries followed DOQQ boundary polygons and were selected in low to non-conifer areas to minimize the potential for seamlines in classifications.

Automated feature extraction
We individually reviewed each tile for the presence of conifers and processed tiles with conifers using the Feature Analyst TM Supervised Learning Wizard, which applies a supervised learning algorithm that extracts features meeting spectral and contextual specifications via a set of training polygons. We digitized a representative sample of conifer image objects across the entire tile to create this polygon training set. We distinguished conifers from other vegetation based on the hue of red under traditional false-color settings, and verified identification using Google Earth TM (version 7. 1.2.2041 2013) [7] . We digitized clearly identifiable individual conifers for the training polygons to ensure the classification of isolated trees in low canopy density areas and used the spectral properties of the cells under false-color settings to delineate the conifer crown. This process trained the OBIA algorithm to distinguish trees from shadows and other vegetation types, minimizing misclassification. Each digitized image object consisted of at least three cells, a canopy area that corresponds to tree height ( ≥ 3m) readily tall enough to break through the sagebrush canopy. The number of samples per tile varied according to image quality and color variation, but always consisted of a minimum of five training polygons. We then used the training set as the input for Feature Analyst TM 's Supervised Learning Wizard to generate conifer features from the four-band NAIP image.
In Feature Analyst TM 's Supervised Learning Wizard, we specified parameters that best represented trees in the supervised machine learning algorithm. We defined our search neighborhoods using the "Natural Feature" feature selector with the "Bullseye 3" parameter over a 25-m 2 moving window to define the spatial and spectral context used by the algorithm to extract features. "Natural Shape" discouraged the algorithm from using hard lines to generate image objects. The Bullseye search window reduced processing time by reducing the number of cells supplied to the learning algorithm while still representing the neighborhood. We specified a "pattern width" of 5 m to match the Bullseye moving window, and we chose 5 m in both instances because this width was larger than our minimum mapping unit, allowing the algorithm to discern features based on collective spectral qualities while still small enough to extract individual trees. Additionally, it allowed us to exclude smaller vegetation with similar spectral signatures and reduce the chance for errors of commission. Detected features > 3-m 2 were aggregated. These parameters were determined to best classify trees after several rounds of tests.
We checked output conifer features for accuracy against the NAIP reference imagery. Typically, the initial results would be over-or under-classified. If the results were under-classified, more training polygons were added to the initial set and a new supervised classification was performed to replace the initial output. In some cases where a single learning algorithm could not be trained to recognize all conifer features (i.e., extremely dense stands), multiple algorithms were run on the same tile to target the variety of feature types and these outputs were merged together. If the results were overclassified, we digitized samples of incorrect, correct, and missed features, which were incorporated into a hierarchical supervised classification performed on the previous OBIA output. This hierarchical classification continued until misclassification errors were minimized based on visual comparison of feature outputs to the NAIP reference imagery. The "Hierarchical Learning" process in Feature Analyst TM is comprised of three sub-processes that retrain the algorithm and optimize the original output: (1) incorrect features are digitized for removal using the "Begin Removing Clutter Tool." We included removal polygon samples of all possible misclassifications such as shadows, riparian vegetation, and other non-conifer features. (2) The correct output features are then identified to retrain the OBIA algorithm. (3) Features that were missed by the previous supervised learning run are digitized and added to the training set using the "Begin Adding Missed Features Tool." After the hierarchical learning process was complete, we carried out additional geoprocessing to dissolve overlapping features, repair polygon geometry, and remove features < 3-m 2 . Any large misclassifications that occurred as a result of spectral overlap (e.g., algae in standing water, irrigated agricultural fields, wet meadows, riparian areas, or patches of other non-conifer vegetation) were removed using custom polygon masks. We then converted the clean shapefiles for each tile to 2bit, binary VHSR rasters (1-m 2 ).
We carried out several post-processing steps on output feature layers for each tile to further improve results. We reviewed tiles for obvious errors of omission and commission. We also checked against neighboring tiles for acute seamlines, which were signs of classification disagreement resulting from tile-based analysis. Tiles were iteratively re-analyzed if seamlines were pervasive. Raster output was considered 'clean' at the completion of this process, and then mosaicked for each zone or PMU in ERDAS Imagine (2013, Leica Geosystems, Atlanta, Georgia) using the "Automatic Most Nadir Seam" setting to minimize seamlines by overlapping areas where the distance to the center point of each image is equal.

Accuracy assessment
To assess the accuracy of OBIA conifer classification, we constructed an error matrix [4] for each PMU. We first generated stratified random points within the 1-m 2 conifer and non-conifer classes and compared our classification at those locations against NAIP reference imagery. We standardized the number of points by sampling 100 points per the average classified area (km 2 ) of the PMUs. We then divided the area of each PMU (km 2 ) by this km 2 -per-point value to weight the number of random points generated for each PMU by its area, with a required minimum of 25 points generated for each PMU. Each random point was visually inspected for errors of omission (e.g., failing to identify a conifer) and commission (e.g., incorrectly classifying non-conifer as conifer) and the results entered in the error matrix ( Table 1 ). We calculated the overall accuracy of conifer and non-conifer classification in each PMU, which identifies the percent of correct classifications from the total cases examined ( Table 2 ; Fig. 3 ). To investigate bias in the OBIA towards errors of commission or omission, we also calculated the user's and producer's accuracy, respectively ( Table 2 , Fig. 3 ). The user's accuracy evaluates the reliability of the output conifer class by determining the percentage of cases correctly attributed to each class and the performance of the classification algorithm by identifying the percent detection of all cases in each class, respectively. The values in the confusion matrices were used to perform an estimated accuracy coefficient (kappa) analysis ( Table 2 ; [4] ). The kappa analysis generates the kappa coefficient ( K hat ), which represents the percent accuracy adjusted for correct classification due to random chance. A K hat > 60% indicate substantial agreement between classification and truth, and those > 80% are almost perfect [10] .

Framework benefits
Our analysis framework utilized intensive AFE to classify target features across an entire region from VHSR imagery that resulted in comprehensive and highly accurate outputs ( Table 2 ). OBIA    methods like AFE that require higher levels of automation are known to be time consuming [1,6,14] , which has restricted the scope of their application [1,11] . Our framework reduces processing time and computational demand in order to make the implementation of such OBIA methods feasible across relatively large spatial extents. This reduction was primarily accomplished by leveraging userfriendly, semi-automated, and inductive learning algorithms in Feature Analyst TM to decrease user investment and processing time [11,12] . We also took advantage of parameters such as the bullseye pattern to reduce the amount of data processed [12] . However, AFE and other semi-automated OBIA methods require substantial operator investment ( [14,16] ; Falkowski and Evans, 2012). For example, to produce our conifer classification, each tile required individualized training polygon development and supervised (often hierarchical) learning runs. The volume of work necessary to map the state with our framework required 10 analysts working congruently for several months. The high operator-involvement of Feature Analyst TM promotes more accurate classification [12] but decreases reproducibility and increases analysis time [14] . To balance these tradeoffs, we integrated our Feature Analyst TM workflow with many time-saving geoprocessing steps such as analyzing imagery on a tile-by-tile basis to reduce computation time and allow for simultaneous, customized AFE, and performing validation within PMUs. Geoprocessing steps such as rasterization and mosaicking were iterated within PMUs in ArcGIS using Model Builder ( [5] , Release 10.2, Redlands, California) so that multiple classified tiles could undergo post-processing in quick succession and simultaneously. We also used Model Builder to iterate the calculation and reclassification of percent  canopy cover smoothed by the 50-m radius neighborhood within sectors. Our framework also allows for several time saving mechanisms going forward. For example, post-processing could be further automated by iterating across PMUs and sectors. User-investment could be reduced on the front end by mosaicking the NAIP tiles into a single layer. However, analysis of VHSR imagery at such a large spatial extent remains limited by processing power. Also, Feature Analyst TM has several features that facilitate further automation via batch processing, such as the ability to save training polygons and learning algorithms for repeated use, allowing analysts to use a single training set and model for all imagery and greatly reduce the user-investment and processing time required for each tile [12] . We could not use these functions largely because of inconsistent quality of NAIP tiles across our mapping extent. However, such features are available for future applications and could easily be incorporated into this existing framework.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.