A carbon monitoring system for mapping regional, annual aboveground biomass across the northwestern USA

This paper presents a prototype Carbon Monitoring System (CMS) developed to produce regionally unbiased annual estimates of aboveground biomass (AGB). Our CMS employed a bottom-up, two-step modeling strategy beginning with a spatially and temporally biased sample: project datasets collected and contributed by US Forest Service (USFS) and other forestry stakeholders in 29 different project areas in the northwestern USA. Plot-level AGB estimates collected in the project areas served as the response variable for predicting AGB primarily from lidar metrics of canopy height and density (R2 = 0.8, RMSE = 115 Mg ha−1, Bias = 2 Mg ha−1). This landscape model was used to map AGB estimates at 30 m resolution where lidar data were available. A stratified random sample of AGB pixels from these landscape-level AGB maps then served as training data for predicting AGB regionally from Landsat image time series variables processed through LandTrendr. In addition, climate metrics calculated from downscaled 30 year climate normals were considered as predictors in both models, as were topographic metrics calculated from elevation data; these environmental predictors allowed AGB estimation over the full range of observations with the regional model (R2 = 0.8, RMSE = 152 Mg ha−1, Bias = 9 Mg ha−1), including higher AGB values (>400 Mg ha−1) where spectral predictors alone saturate. For both the landscape and regional models, the machine-learning algorithm Random Forests (RF) was consistently applied to select predictor variables and estimate AGB. We then calibrated the regional AGB maps using field plot data systematically collected without bias by the national Forest Inventory and Analysis (FIA) Program. We found both our project landscape and regional, annual AGB estimates to be unbiased with respect to FIA estimates (Biases of 1% and 0.7%, respectively) and conclude that they are well suited to inform forest management and planning decisions by our contributing stakeholders. Social media abstract Lidar-based biomass estimates can be upscaled with Landsat data to regionally unbiased annual maps.


Introduction
Humans heavily influence forest carbon (C) dynamics directly via land management as well as indirectly by changing greenhouse gas composition and climate (Birdsey and Pan 2015). The U.S. Forest Service (USFS) Forest Inventory and Analysis (FIA) Program plays a critical role in monitoring national forest trends, as do the National Aeronautics and Space Administration (NASA) and the U.S. Geological Survey (USGS) with Landsat and other remote sensing data products (Masek et al 2008(Masek et al , 2013(Masek et al , 2015. Light detection and ranging (lidar) is the most accurate remote sensing technology for estimating forest and stand structure attributes (Lefsky et al 2001, 2002b, Hyde et al 2007, Sexton et al 2009, Mondino et al 2020. Thus, there is considerable motivation to integrate lidar measurements into forest inventories (Duncanson et al 2010, Johnson et al 2014, Sheridan et al 2015. Indeed, land managers have cumulatively invested and continue to invest millions of dollars for commercial off-the-shelf (COTS) airborne lidar collections to achieve multiple natural resource management objectives . Many COTS lidar collections aim to update the National Elevation Dataset (NED) while acquiring data towards national lidar coverage, partially funded by the USGS 3D Elevation Program (3DEP) (Sugarbaker et al 2017). Applications in geomorphology, hydrology, and road planning may only require a Digital Terrain Model (DTM) of the bare earth surface. However, applications of lidar for forest inventory require further investment into field plot data collections to estimate most stand attributes of interest (volume, basal area, biomass, etc.), often at substantial added expense. Forest managers can justify the added cost for field plot data collections because lidar-based inventory can map stand structure attributes with high precision at a pixel level to display within-stand variation, whereas the spatial resolution of traditional forest inventory is limited to the stand or stratum level (Hummel et al 2011).
Lidar is known to provide accurate characterization of forest structure at the plot, stand, and landscape levels (Kane et al 2010). Therefore, multiple COTS lidar and associated field plot data collections, which we henceforth term 'project datasets,' have unrealized potential to more broadly inform forest planning, management, and monitoring efforts at the regional level. Integrating multiple project datasets into a common database adds further value to stakeholders' already large investments into project data collections. However, the caveat is that they collectively represent a spatially biased sample of forests in the region. This is a key distinction of our project datasets from FIA plot data, which provide an unbiased, systematic sample of forest conditions in space and time, as is needed in support of forest planning and monitoring, reporting, and verification (MRV) (Tinkham et al 2018, Hurtt et al 2019. The use of project datasets that are disjunct in time or space assumes the transferability of estimates from project landscapes where both lidar and field plot data are available to project areas where only lidar data are available. It has been demonstrated that estimates of stand structure attributes can be transferred in both time (Fekety et al 2015) and space (Fekety et al 2018) in northern Idaho, where coniferous forest composition and structure may be the most diverse in the northwestern USA. These studies show that although there is some loss in the accuracy and precision of stand structure attributes transferred across time or space, acceptable estimates are still obtained that meet management needs, or at least suffice until more contemporary or local inventory plot measures can be obtained (Fekety et al 2015(Fekety et al , 2018. While airborne lidar coverage is complete in many eastern USA states, lidar collections in the West are spatially (Fekety et al 2018) and temporally (Fekety et al 2015) disjunct. Yet western USA lidar coverage is extensive enough that all major forest types and conditions are represented in many locations, primarily as a single snapshot in time (as early as 2002 in this study), but with little overlapping coverage (i.e. repeat observations).
For mapping, radiometrically and geometrically calibrated Landsat image time series provide the spatial and temporal continuity needed for regional and national forest planning and monitoring (Banskota et al 2014, Wulder et al 2019. For these reasons, the vast majority of predictive models for mapping stand structure attributes use FIA plots for calibration (because they are statistically unbiased) and Landsat images as the primary source of predictor variables (because they are spatially and temporally continuous and consistent) (Powell et al 2010, Zhu and Liu 2015, Kennedy et al 2018a. The unrivaled duration of moderate spatial resolution Landsat image time series and recent availability of advanced time series algorithms and products (Masek et al 2008, Huang et al 2010 provides an important temporal dimension to estimating not just AGB but changes in AGB to improve understanding of the carbon consequences of forest management and disturbance (Kennedy et al 2018a(Kennedy et al , 2018b. However, the 30 m resolution of Landsat TM is less than satisfying to land managers that seek to make operational decisions at the local level. Moreover, Landsat and for that matter all passive optical imagery suffers from lack of sensitivity in cases of high canopy closure  or in high biomass forests, with the signal saturating at an AGB density of~150-250 Mg ha −1 or more depending on the study (Huete et al 1997, Steininger 2000, Dong et al 2003, Avitabile et al 2012, Zhu and Liu 2015, Durante et al 2019. The sparseness of the FIA sample plot grid at local scales also compromises the local accuracy of regional map products, and the spatial configuration of FIA plots makes them problematic to relate to Landsat pixel data directly for model development, given the spatial mismatch between the 7.3 m radius, round configuration of an FIA subplot and 30 m × 30 m square Landsat pixels (Tinkham et al 2018). Inevitably, the four subplots will intersect a different number of pixels and in varying proportions; the subplots occasionally (10% of plots in this study) also represent multiple condition classes, such that forest edge effects add further noise to any modeled relationships (figure 1).
Finally, the geolocation accuracy of the center subplot varies by USFS FIA Region, depending on the protocol and the quality of the Global Navigation Satellite System (GNSS) receiver used in the field; the three peripheral subplots, while systematically laid out from the center subplot by consistent distances (36.6 m) and bearings (120 • , 240 • , 360 • ) are usually not georeferenced, making them more subject to locational inaccuracy due to additive errors in accounting for horizontal distance on slopes and for magnetic declination on compass azimuths (Zald et al 2014).
Accuracy issues aside, FIA plot locations are confidential and not readily available to users. This policy, although justifiable for maintaining plot integrity and landowner trust, has been a major deterrent for forestry applications that involve developing relationships to remotely sensed data, particularly globally available 30 m Landsat TM imagery (Tinkham et al 2018). The scale of lidar point cloud data provide a way to bridge the scale gap, incompatible shapes, and location issues that complicate the relationships between Landsat pixel data and FIA or other field plots (Zald et al 2014(Zald et al , 2016. Lidar points can be binned just as easily within fixed-radius plot footprints as within 30 m pixels, such that the lidar points are tightly coupled to tree measures or Landsat pixel values, respectively. Indeed, this is the basis for small area-based predictive modeling of stand structure attributes for lidar-based forest inventory, as demonstrated in multiple research papers and widely implemented operationally by forest managers (Naesset 2004, Hudak et al 2006, 2008, Hyyppä et al 2008, White et al 2013. In this paper, we present a prototype Carbon Monitoring System (CMS) that addresses three objectives in three steps.
• The first step leverages the spatial strength of lidar for characterizing AGB at the scale of forest inventory plots and across lidar project landscapes; our hypothesis is that AGB mapped from lidar data at 30 m resolution represents the full range of AGB conditions and can be used to train a regional model to estimate AGB where lidar data are unavailable in space or time. • The second step leverages the temporal strength of Landsat for monitoring AGB at the regional scale; our hypothesis is that annual Landsat time series metrics that capture disturbance dynamics can be combined with metrics derived from 30 year climate normals and static topography to estimate AGB across large productivity gradients without asymptote. • The third step leverages the strength of FIA plots for design-based, unbiased AGB estimation; our hypothesis is that FIA plot data can be used for regional AGB map calibration instead of regional AGB model calibration as is conventional.
We propose that our prototype CMS provides an MRV framework that can provide unbiased AGB estimates over time and engage stakeholders, including those who contributed the project datasets upon which our prototype CMS is based.

Study area
The study area is comprised of the forested portions of Washington, Oregon, Idaho, and the 12 counties in Montana located west of the Continental Divide (figure 2). The climate for this region transitions from continental in the east to maritime in west, with mean annual precipitation ranging greatly from 196 mm to 3041 mm (Current Results 2019). The major mountain ranges are the Northern Rocky Mountains, Cascade Range, and the Coast Range, with western slopes being more productive then eastern slopes due to steep orographic precipitation gradients. Forests east of the Cascade Range are primarily coniferous whereas those west of the Cascade Range are a mixture of conifer and deciduous species.

Project datasets
No field data were collected specifically for this study. Rather, field plot and lidar project data collected by stakeholders for forest inventories were used, contributed by Federal State University, tribal, and private organizations, hereby collectively termed our stakeholders (N = 29), which collectively totaled 3805 field-measured plots (figure 2). The requirements for acceptable plot data were: (1) fixed-area plots; (2) georeferenced with a GNSS capable of differential correction; (3) established within ±3 years of an overlapping lidar collection; and (4) not disturbed in the time between field and lidar data collections.
Field plots consisted of a single, fixed-radius plot ranging between 169 m 2 and 809 m 2 , with the exception of 60 nested plots from Priest River Experimental Forest, Idaho, which were established following the FIA protocol of four distributed subplots (Bechtold and Patterson 2005). Field plots in the other 28 project datasets were established following stratified random designs, usually as support for the corresponding lidar collection, although the stratification variables used differed among contributing stakeholders. Across all field protocols, a minimum of species, status (live or dead), and diameter at breast height (DBH) were recorded for all trees with a DBH greater than or equal to 12.7 cm and consequentially trees with DBH less than 12.7 cm were excluded from this study (including the FIA plot data used for validation). Tree heights were measured on subsamples of the trees; in total, 48 836 out of 97 386 trees in the database (50.1%) had a height measurement.
AGB was calculated from project field plot tree measurements using the default equations found in local variants of the Fire and Fuels Extension (FFE) of the Forest Vegetation Simulator (FVS) (Rebain 2015, Dixon 2018. These equations are based on a series of regional volume and biomass equations. The aboveground portion of the live and standing dead trees were summed to a single, plot-level AGB value; the belowground portion of the trees and non-tree species were excluded from the AGB estimates. These plot-level AGB estimates were the response variable for the landscape model (section 2.3), the estimates of which informed the regional model (section 2.4), via a two-tiered modeling strategy (figure 3).

Landscape model
The landscape model predicted AGB as the response variable using climate, topographic, and lidar metrics as predictor variables (table 1), to estimate AGB where lidar data were available. Local and regional databases were searched to identify discrete airborne lidar collections located in the study area. Lidar collections included in this prototype CMS covered forested ecoregions and were collected between 2002 and 2016. Lidar data were processed using FUSION v3.60, a free and open source lidar processing software developed by the USFS (Mcgaughey 2015). Plotlevel lidar metrics used as predictor variables in the landscape model (table 1) were calculated by clipping and height-normalizing the lidar point cloud. Project-level gridded lidar metrics used in creating AGB maps (maps are described below) were also created with FUSION. Gridded lidar metrics for Idaho and eastern Washington were created at a 30 m spatial resolution that matched the LandTrendr data (i.e. EPSG:5070; https://epsg.io/5070). Gridded metrics calculated with the same FUSION parameters at 30 m resolution for Oregon and the remaining portion of Washington already existed and were reprojected and resampled to match the spatial reference system used in this project. In total, 13 007 443 ha of lidar coverages were processed (figure 2).
Additional predictor variables used in the landscape model (table 1) included plot-level and gridded climate metrics (1961-1990 normals) obtained from the Climate-FVS Ready Data Server (Crookston 2016). Also included were topographic metrics based on Shuttle Radar Topography Mission (SRTM) 1 Arc-Second Global products because of its near global coverage (USGS 2018). SRTM elevation rasters were reprojected and resampled to 30 m resolution, and topographic metrics were calculated in GRASS GIS v7.4. Plot-level topographic estimates, calculated as the area-weighted mean of the pixel values intersecting the plot footprint, were extracted from these rasters.
Random Forests (RF) regression (Breiman 2001) was chosen to predict AGB because it is a flexible, non-parametric approach that can account for non-linear variable interactions. RF is encoded to bootstrap the dataset to generate a virtual forest of regression trees while employing an out-of-bag sampling with replacement strategy that provides resistance to overfitting (Hudak et al 2008, Latifi and Koch 2012, Hayashi et al 2015. To further guard against overfitting, highly correlated explanatory variables (r ≥ 0.9) were removed, and the Model Improvement Ratio statistic available in the model selection tool of the R package rfUtililies (Evans and Murphy 2017) was used to specify the model using the rf.modelSel tool. Based on preliminary analysis, the parsimony parameter of rf.modelSel was set to 1%; the RF model error was found to stabilize after generating 200-400 trees, hence the default RF setting of 500 regression trees was maintained to assess RF model fit as reported by the pseudo-R 2 statistic. All field plots were used to construct the landscape model, since the AGB estimates would ultimately be validated against independent FIA data (section 2.5). The landscape model was applied to rasters of predictor variables that covered the spatial extent of available lidar collections to estimate mean AGB calculated from the 500 trees. The standard deviation in estimated AGB calculated across the 500 trees was also mapped to show pixel-level model uncertainty (Urbazaev et al 2018).

Regional model
The regional model used a stratified random sample of pixel-level estimates of AGB from the landscape model as the response variable, and climate, topographic, and Landsat image time series (LandTrendr; Kennedy et al 2010Kennedy et al , 2015 metrics as predictor variables, to estimate AGB annually and synoptically across the study area. A post-stratification and random sample of pixels from the AGB maps derived from the landscape model were used to train the regional model. Lidar collections often cross ecological and land-use boundaries; therefore, AGB pixels were subsetted to include only the forested ecoregions as determined by the Commission for Environmental Cooperation land classification system (CEC 1997, Omernik andGriffith 2014). Also, AGB pixels identified as water or developed land (e.g. urban areas)  (2011) according to the 2011 National Land Cover Database (Homer et al 2015) were excluded from the random sample. AGB pixels mapped with the landscape model were binned into 10 Mg ha −1 bins and 500 pixels were randomly selected from each bin. When a bin had fewer than 500 pixels, one-half of the pixels were randomly selected. This procedure generated a sample of 117 808 pixels that was divided in half into training and testing datasets. Landsat imagery from 1984-2016 were processed through the LandTrendr algorithm (Kennedy et al , 2015 to create a Normalized Burn Ratio (NBR) image time series. Intermediate LandTrendr products, namely annual tasseled-cap brightness, greenness, and wetness values and disturbance layers, were used as predictor variables in the regional model (table 2) along with the fully derived LandTrendr outputs quantifying the magnitude and time since last disturbance. These LandTrendr products were calculated on an annual basis and provide the temporal component to this CMS, such that annual maps covering the entire study area would be created. The year of the lidar collection determined which year of the LandTrendr time series was selected to extract a given 30 m AGB pixel value to be used for model training.
AGB is strongly correlated to tree height across forest types (Lefsky et al 2002a); therefore, the Simard et al (2011) global canopy height map based on nominal 2005 GLAS height estimates and 2000 SRTM data was included as an explanatory variable. Climate and topographic metrics described in the landscape model (table 1) were also used in the regional model (table 2). The regional model was specified using the same RF model selection procedure as the landscape model, and maps of AGB means and standard deviations calculated across 500 trees were also generated in the same manner.

Bias correction
Mapped estimates of AGB from the regional model were biased compared to FIA estimates at the plot level; we therefore calibrated the maps to make them consistent with unbiased FIA estimates of AGB at both the pixel-and county-level aggregate scales. The measurement year for each FIA field plot was determined and the corresponding pixel value was extracted from the annual regional map of that same year. Annual bias correction factors were calculated as the ratio of the mean of the plot-level FIA AGB estimates to the mean of the mapped AGB estimates. Because our maps included years not present in the FIA database (2000)(2001)(2002)(2003) for some states, a relationship between calendar year and annual bias correction factor was calculated using linear regression for the years when FIA measurements were available in all four states, such that the map calibration was regionally balanced. Annual values from the line of best fit were used to calibrate the corresponding map. Equivalence plots (Robinson et al 2005) were used to test whether mapped AGB estimates were statistically equivalent to AGB estimated independently by FIA.

Forest/non-forest mask
Non-forested pixels in the annual, regional AGB maps were masked using the ALOS PALSAR 2009 forest/non-forest (F/NF) mask (PALSAR JAXA 2014, Clearly, commission and omission errors are associated with the 2009 PALSAR mask, as they would be with any mask. Had PALSAR or similar F/NF masks been available for all years (not just 2007-2010, 2015 and 2016) in the 2000-2016 AGB time series, an annual F/NF mask could have been applied to each annual AGB map. Using multiple F/NF masks introduces another source of variability and error in AGB accounting related to the datasets, which we judged exceeded the variability of interest related to changing conditions in the scene (i.e. disturbance effects). Although the National Land Cover Data (NLCD) map products (Homer et al 2015) had F/NF masks in 2001,2004,2006,2008,2011,2013, and 2016, their spatial extent was limited to the contiguous USA. Therefore, the 2009 PALSAR was applied to give our prototype CMS the potential for broader applicability. However, we compared the influence of mask choice on county-level biomass estimates to test if local AGB differences between masks would translate to differences in scaled-up AGB aggregations.

Landscape model
Mean canopy height was the most important predictor of AGB in the landscape model ( figure 4(a)), but density metrics across all vertical canopy strata comprised the largest group of predictors. Climate and topographic metrics were less important than the lidar metrics, with mean annual precipitation and transformed aspect being the top predictors, respectively ( figure 4(a)). The landscape model explained 80% (RF Pseudo-R 2 = 0.80) of the variability in AGB across all project areas; the RF model RMSE was 114.6 Mg ha −1 (46.9 %) and Bias was 1.9 Mg ha −1 (0.8 %) ( figure 5(a)). Even though these project areas covered a wide range of conditions, model predictions did not asymptote with high levels of AGB, as is commonly observed with lidar-derived estimates of AGB or related forest structure attributes.

Regional model
We considered AGB estimates from the landscape model, available where lidar had been flown, as representative of the full range of forest conditions across the region; therefore, a stratified random sample of AGB pixels from those maps was used to train the regional AGB model. Disturbance and spectral metrics had lower variable importance scores than the climate and topographic metrics and the global canopy height product (Simard et al 2011), which ranked fifth in importance ( figure 4(b)). The most important climate and topographic metrics were mean annual precipitation (top predictor overall) and slope x sine(aspect), respectively ( figure 4(b)). The regional model explained almost as much variance (RF Pseudo-R 2 = 0.78, figure 5(b)) as the landscape model ( figure 5(a)), while the precision decreased (RMSE of 161.0 Mg ha −1 ) (27.0 %) ( figure 5(b)). The regional model Bias of 2.9 Mg ha −1 (0.5 %)  was similar to the landscape model, and the AGB estimates also did not asymptote ( figure 5(b)). Validation of the regional model AGB estimates with the test dataset from the landscape model showed very slightly improved model fit (R 2 = 0.80) and precision (RMSE = 151.6 (25.9%)), but the Bias increased three-fold to 9.4 Mg ha −1 (1.6%) (figure 5(c)), which highlighted the need for subsequent bias correction.

Bias correction
The ratio of FIA AGB/CMS AGB was calculated for each year (e.g. figure 7(a)). We observed an unanticipated trend in these ratios, which tended to increase over time as function of map year ( figure 7(b)). Thus, we constructed a simple linear model (c = 0.005466y-10.265) of the annual correction factor (c) by year (y) to calibrate the annual AGB maps (figure 6) to unbiased annual FIA estimates of AGB at the plot level. Applying the simple linear model to the annual 2000-2016 maps resulted in the closest match to FIA estimates at the plot level (figure 7(c)) and upon aggregation to the county level ( figure 7(d)).  (b) Regional model AGB estimates with bias correction (e.g. 2011). The 2009 PALSAR forest/non-forest (F/NF) mask was applied in both cases. The confidence interval positioned at the mean is within the region of similarity described by the shaded grey polygon defining the region of similarity, meaning the intercept of the observed/predicted regression is significantly similar to 0; therefore the estimates are unbiased. The black, solid line of best fit is within the angle described by the dashed black lines defining the region of similarity, meaning the slope of the observed/predicted regression is significantly similar to 1; therefore the estimates are proportional. It is important to clarify that the spatially discontinuous AGB maps generated with the landscape model were not calibrated with the bias correction; nevertheless, they matched independent AGB estimates from FIA very well and were statistically equivalent ( figure 8(a)). Only the continuous, annual AGB maps generated with the regional model were calibrated with the bias correction, after which these AGB estimates were also statistically equivalent to independent AGB estimates from FIA ( figure 8(b)).

Forest/non-forest mask
The choice of F/NF mask to use influenced the AGB maps locally, which was evident at the county level (figure 10). We assessed nine F/NF masks: five other global PALSAR masks besides the 2009 mask selected, and also spectral data derived masks, one produced by the USGS Gap Analysis Project (GAP; USGS 2011) at the national scale as well as two regional FIA masks (Blackard et al 2008, Blackard 2009). A few disparities were evident at the county level, where the selected 2009 PALSAR mask differed from the mean of all nine F/NF masks by more than a standard deviation, but these localized effects of the mask did not translate into differences at the state level for any year (figure 10). By applying the F/NF mask as the last step in our CMS workflow, any errors introduced by the F/NF mask were minimized. Aggregated county-level estimates of AGB density (Mg ha −1 ) in figures 9 and 10 appear more scattered than aggregated county-level estimates of total AGB (Tg) in figure 7(d) because in the former case smaller counties have equal weight, and in these smaller counties the aggregated county-level AGB estimates are less stable because they are based on less forested land and fewer FIA plots.

Discussion
Our overarching strategy was to develop a CMS that would maximize the utility of project datasets contributed by stakeholders and deliver to them unbiased regional maps that are temporally and spatially consistent. Given the ad hoc nature of various lidar-based forest inventory projects initiated by diverse stakeholders acting independently to achieve multiple forest management objectives, it was inevitable that projects would differ with regard to plot sampling protocols and lidar acquisition parameters. Nevertheless, AGB estimates by the landscape model ( figure 5(a)) were unbiased (figure 8(a)). These estimates were hence highly suited for training the regional model ( figure 5(b)), supporting our first hypothesis. Using a simple linear model to calibrate the regional maps (figure 7) could be thought of as the simplest model-assisted strategy for unbiased AGB estimation ( figure 8(b)) (Gregoire 1998, Ståhl et al 2016. The bias correction improved the accuracy of the maps (by definition) but did not alter the variance structure of the data, thus leaving the model precision unchanged because the pixel-level standard deviation in AGB scales with the AGB mean calculated across the same 500 RF trees.
Our mapped AGB estimates from both the landscape and regional models were similarly unbiased and proportional compared to independent FIA AGB estimates as illustrated using equivalence plots (figure 8). This supports earlier findings that lidar-and Landsat-based AGB estimates are comparable upon aggregation to the watershed scale (Ohmann and Gregory 2002, Bell et al 2015, 2018 or larger units (e.g. counties), even if the landscape model AGB estimates largely derived from lidar are more locally accurate. As expected, AGB largely predicted from lidar metrics did not asymptote ( figure 5(a)). However, spectral variables lose sensitivity at AGB densities above 300-400 Mg ha −1 (Zhang et al 2014) or 450-500 Mg ha -1 (Kennedy et al 2018a); we attribute this lack of a saturation effect in the regional model to the inclusion of climate metrics, which proved to be the most important predictors in the regional model ( figure 4(b)). Thus, climate metrics helped capture without asymptote (figures 5(b) and (c)) the higher biomass forests that commonly occur west of the Cascades in our northwestern USA study region (figure 6), supporting our second hypothesis.
Our prototype CMS approach is bottom-up, beginning with the tree measurements. FVS uses expansion factors to account for different inventory plot sizes (Dixon 2002(Dixon , 2018, which works well to ensure consistent stand structure attributes (e.g. AGB) across different plot sizes. Traditional forest inventories typically use variable-radius plot designs for the efficiency gained in the field (Avery and Burkhardt 2002, Husch et al 2003, Packard and Radtke 2007; however, we included only fixedradius plots in our project reference database because variable-radius plots do not allow the lidar returns to be precisely binned within a defined plot footprint to ensure clean relationships to the tree measures in the spatial domain (also provided the plot geolocations are accurate) . While we excluded variable-radius plots from our project database, it is worthwhile noting that there is evidence for successful integration of variable-radius plot data with lidar data (Deo et al 2016).
We also sought to limit plot-scale sources of noise in the temporal domain, by excluding plots that had been disturbed between collection dates, and restricting temporal offsets between tree and lidar measures to ±3 years, a reasonable compromise, but this criterion for our project reference database could be easily relaxed or tightened as warranted. Maintaining both the tree and lidar measurements in the project database ensures that these plot-level reference data remain relevant at a regional scale, regardless of how long ago the reference data were collected, even as forest conditions change. While vegetation conditions can and do change at any given location, the broad range of conditions in the regional population should not change, even as disturbance regimes may shift the relative proportions of forest conditions in response to changing disturbance and climate regimes (Cohen et al 2016, Dolan et al 2017, Fekety et al 2020a. Our project reference database, and indeed FIA plot data as well, are also FVS-ready in the sense that the tree measurements can be projected to estimate tree growth and future AGB stores, and FVS also has options to consider the influence of disturbance (fire, insects, disease) (Dixon 2002(Dixon , 2018 and climate (Crookston et al 2010(Crookston et al , 2014 on future forest structure and composition (Crookston Buma andWessman 2013, Fekety et al 2020a). Furthermore, FVS can output any number of stand structure attributes (tree density, basal area, volume, etc) besides AGB; our project reference database, like the FIA plot database, can also provide these attributes (Fekety et al 2018). Finally, we can continue to add future project datasets contributed by stakeholders to our project reference database, such that it is a 'living' database that continually expands the range of forest conditions represented with tree and lidar measurements.
While airborne lidar data are highly valued for accurate forest structure characterization, other remote sensing technologies are emerging that could be incorporated into our prototype CMS. Digital aerial photography (DAP) and Structure from Motion (SfM) techniques including methods collected from low-cost drones provide a cheaper alternative to lidar, which could fit well into this CMS, provided that the derived data products can be consistent (Strunk et al 2019). Studies have shown DAP measures of canopy height may be comparable to those derived from lidar first returns, although canopy density metrics are probably not comparable (White et  The best F/NF mask to choose for AGB accounting and MRV is likely to change as new remote sensing data products become available. We preferred the 2009 PALSAR mask not only for its global coverage but also for its independence from all other datasets used in our CMS. Global maps of forest cover change derived from PALSAR have compared well to the United Nation's Food and Agricultural Organization (FAO) global Forest Resource Assessments (FRA) in tropical rainforests of Southeast Asia (Dong et al 2012) and South America (Qin et al 2017). PALSAR can be used to detect forest cover changes due to either fragmentation in closed canopy forests (Qin et al 2017) or encroachment in open canopy forests (Wang et al 2018). Timing of dataset acquisition is another consideration for accurate acquisition of canopy structure using PALSAR. Deng et al (2014) found that acquiring data during drier seasons helped mitigate the influence of soil and surface moisture on L-band backscatter, such that it more accurately detects biomass. Fusion between PALSAR and Landsat images has clear potential for more accurate forest cover mapping (Qin et al 2016).
We must note that the estimated uncertainties in AGB by our bottom-up approach encompass but do not specifically account for the different sources of error, namely: (1) tree measurement errors, plot X, Y location errors, and allometric errors within our project reference plot database; (2) lidar X,Y,Z measurement and georegistration errors; (3) DEM elevation and climate interpolation errors; (4) landscape model errors; (5) Landsat measurement and georegistration errors; and (6) spatiotemporal and model errors associated with the GLAS and SRTM dataderived canopy height product (Simard et al 2011). These additive sources of error collectively degrade the R 2 and RMSE fit statistics reporting model precision throughout this paper, but especially the regional AGB estimates in figures 5(b), (c) and 8(b).
However, we do assert that calibrating the maps near the end of our workflow using unbiased FIA plot estimates effectively minimized bias, supporting our third hypothesis. The global accuracy of the maps is evident upon aggregation to the county level (figure 7(d), 9, 10). Locally, our AGB estimates vary less at the county level than the AGB estimates from FIA (figure 9). Future research could more thoroughly evaluate local accuracies in AGB mapped by our CMS compared to other methodologies.

Conclusion
The strength of our project reference database is that it precisely matches tree and lidar measurements at the plot scale in space and time, while acknowledging that it is a biased sample of the regional population. We proceeded to use it to train RF models for predicting AGB from lidar (and topographic and climate) metrics across project landscapes, leveraging the strength of lidar for characterizing forest structure variation at finer (plot/pixel) scales. We in turn used these AGB maps to train a regional model predicting AGB from climate, topographic, and LandTrendr metrics to generate regional AGB maps on an annual time step, thus leveraging the strength of Landsat for capturing annual forest cover dynamics over many years. The regional AGB maps also overcame the spectral saturation limitation of Landsat in high biomass forests by including climate and topographic metrics as predictors, which can capture the large productivity gradients across the northwestern USA non-asymptotically. Our CMS leveraged the strength of FIA plots as an unbiased estimator of forest conditions to calibrate our regional, annual AGB maps, therefore overcoming our initial constraint of using a biased AGB sample as the basis of our prototype CMS. Our AGB maps benefit not just the contributing stakeholders but a much broader group of users, particularly USFS regional planners, and provide critical information for carbon monitoring.

Acknowledgments
This work was primarily funded by a NASA Carbon Monitoring Systems (CMS) Program Award (NNH15AZ06I) with secondary funding for additional WA and OR lidar processing provided by the Joint Fire Science Program (JFSP) Fire and Smoke Model Evaluation Experiment (FASMEE) Project (15-S-01-01). University partners were funded through Joint Venture Agreements with the University of Minnesota (15-JV-044), Colorado State University (16-JV-061), Oregon State University (15-JV-041), and the University of Idaho (15-JV-040).
We greatly appreciate our stakeholders who contributed both field plot and lidar project datasets from several USFS National

Data Availability Statement
The data that support the findings of this study are openly available at the following DOIs. The NASA Oak Ridge National Lab (ORNL) Distributed Active Archive Center (DAAC) hosts both the regional 2000-2016 AGB maps (Fekety and Hudak 2019;10.3334/ORNLDAAC/1719) and the landscape AGB maps from which they were derived (Fekety and Hudak 2020;10.3334/ORNLDAAC/1766). Both AGB map products are accompanied by maps of RF model uncertainty, also at 30 m resolution. Publicly available project reference plot data are archived on the USDA Research Data Archive (Fekety et al 2020b; www.fs.usda.gov/rds/archive/catalog/RDS-20 20-0026).