Mapping Russian forest biomass with data from satellites and forest inventories

The forests of Russia cover a larger area and hold more carbon than the forests of any other nation and thus have the potential for a major role in global warming. Despite a systematic inventory of these forests, however, estimates of total carbon stocks vary, and spatial variations in the stocks within large aggregated units of land are unknown, thus hampering measurement of sources and sinks of carbon. We mapped the distribution of living forest biomass for the year 2000 by developing a relationship between ground measurements of wood volume at 12 sites throughout the Russian Federation and data from the MODIS satellite bidirectional reflectance distribution function (BRDF) product (MOD43B4). Based on the results of regression-tree analyses, we used the MOD43B4 product to assign biomass values to individual 500 m × 500 m cells in areas identified as forest by two satellite-based maps of land cover. According to the analysis, the total living biomass varied between 46 and 67 Pg, largely because of different estimates of forest area. Although optical data are limited in distinguishing differences in biomass in closed canopy forests, the estimates of total living biomass obtained here varied more in response to different definitions of forest than to saturation of the optical sensing of biomass.


Introduction
Although systematic forest inventories have been used to define the amount of carbon held in northern mid-latitude forests (Goodale et al 2002), the spatial distribution of carbon stocks is not well characterized. The spatial distribution is, nevertheless, important for determining the emissions of carbon that result from disturbance or deforestation and the uptake of carbon that results from recovery and growth (Houghton 2005). Estimates of carbon sources and sinks based on forest inventories are variable, especially for Russia and the former Soviet Union, where the net carbon balance ranges between a source of 0.5 Pg C/yr and a sink of 1.02 Pg C/yr (review of 15 studies by Shvidenko et al (1996), Goodale et al (2002)). Not only does Russia represent the largest political unit in the northern hemisphere and contain the largest stocks of terrestrial carbon (Apps et al 1993, Goodale et al 2002, it is also the country where estimates of net carbon flux are most divergent. The wide range of flux estimates based on Russian forest inventory data is ironic because most of the estimates are based on essentially the same data. The federal forest service of Russia collects detailed stand level information on stemwood volumes over about 30 million ha (or about 4% of the forest area) annually (Kukuev et al 1997). Changes in the forests (areas harvested, burned, planted) are collected by forest management enterprises (leskhozes) and aggregated to the provincial and national level. The variability among published estimates of biomass reflects not only updating and aggregation, but different methods for converting wood volumes to biomass, and biases introduced through economic and political incentives to exaggerate accomplishments (e.g., land areas planted) and to under-report problems (e.g., forest area burned) (Alexeyev et al 2004).
If a correlation could be demonstrated between contemporary satellite data and forest biomass, the relationship might be used to distribute forest biomass across all of Russia in a consistent, transparent manner. In this paper we report an attempt to use satellite data to map the biomass of Russian forests for the year 2000. We used MODIS satellite data (and MODIS products), calibrated against inventory data from individual stands sampled between 1998 and 2000, to estimate forest biomass across all of Russia.

Selection of sampling sites
The territory of Russia is commonly divided into four geopolitical regions: European Russia (including the Ural Mountains), Western Siberia, Eastern Siberia and the Far East (figure 1).
In contrast to these longitudinally arranged geopolitical regions, the major forest biomes of the Russian Federation generally extend along east-west parallels (figure 1): northern taiga (1), middle taiga (2), southern taiga and mixed forest (3), temperate forest (4), and forest steppe (5). The intersection of the four geopolitical regions with the five biomes defines 15 major geographical units.

Forest inventory data
As part of NASA's Land Cover/Land Use Change (LCLUC) program, we developed collaborative relationships with a number of Russian forest scientists. These scientists were crucial for obtaining local forest inventory data, within 12 of the 15 geographical units (figure 1, table 1). The specific locations for the sites were determined from the availability of forest inventory data. Sites had to be large enough to accommodate at least 1000 forest stands (polygons) of 3 ha or greater.
For each site, we obtained a digital map of forest polygons (stands) and the inventory data characterizing each polygon. Data included, for each polygon: area, forest type, dominant species, species composition, age, height, stocking density, volume of live stem wood (growing stock), and volume of snags and logs. Forest types or species groups included pine, spruce, mixed conifer, deciduous, and mixed (coniferous and deciduous) forest. Altogether, for the 12 sites, we obtained information for 42 182 polygons, covering a total area of 750 907 ha. Polygons ranged in size from 0.7 to 1503 ha, with an average of 18 ha/polygon.
Forest growing stock (m 3 of wood per hectare) was converted to biomass (Mg ha −1 ) for each polygon using the allometric equations of Alexeyev and Birdsey (1998). The coefficients for the equations varied with forest age, species group, and region. They defined total forest biomass, including living above-and below-ground tree biomass and under-story vegetation. Carbon was assumed to be 0.5× biomass.

Evaluating the relationship between forest biomass and MODIS products
We used the MODIS bidirectional reflectance distribution function (BRDF) product (MOD43B4) as the independent variable for predicting biomass. The MOD43B4 product is corrected for the off-nadir characteristics of scanning sensors and for atmospheric haze and aerosols. The product is a 16day composite of MODIS reflectance at a spatial resolution of 1 km. We used the composite for mid-July 2000, and resampled to 500 m resolution using a nearest-neighbor algorithm.
To geo-register the satellite and ground data at each site, we used Landsat ETM+ data, along with spatially explicit GIS layers, including the hydrological network, stand polygon boundaries, and geographic identification. We then superimposed the MODIS product over the geo-referenced map of forest polygons. Because the stand polygons varied in size, some MODIS cells included a single polygon and some included multiple small polygons. When a cell included more than one forest polygon, we calculated an area-weighted mean biomass for each 500 m × 500 m MODIS cell (figure 2). Cells that contained polygons of non-forest were omitted from the training procedure. The calculation of weighted mean biomass was an aggregation procedure that often prevented us from assigning a particular forest type to a MODIS cell. Thus, we did not distinguish among species groups or forest types but, instead, lumped all species groups together as forests. The aggregation procedure also reduced the effective number of forest 'polygons' (now aggregated to 500 m × 500 m cells), at some sites substantially.
Because satellite reflectance data (predictive variables) are highly intercorrelated, and the response (biomass) is potentially non-linear, standard multiple regression techniques were unsuitable. Instead, we used bootstrapped regression trees to develop associations between mean reflectance and biomass. Specifically, we used Breiman and Cutler's random forests (RF) ensemble prediction method (Breiman 2001, Liaw andWiener 2002). Using the RF algorithm, we built 500 regression trees using different random samples of the data. Model error using the RF algorithm was quantified with the one-third of the data randomly excluded from the construction of each of the trees. The analysis was performed with the randomForest package (Liaw and Wiener 2002) in the R programming environment (R Development Core Team 2006).
We used two different land-cover maps, the GLC2000 land-cover product (Bartalev et al 2003) and the MOD12Q1 land-cover product (Schaaf et al 2002), to identify areas of forest and non-forest throughout Russia. The final step was  to assign a biomass value to each 500 m × 500 m cell of forest using the results from the ensemble of 500 regression trees.

Results
For individual sites, the fraction of the variance explained by the regression trees varied between 0.01 and 0.67 (table 1  and figure 3). When data from all of the sites were lumped in the regression trees, the predictive capability of the model (i.e., variance explained) was 0.61. In general, the models underestimated polygons with high biomass and overestimated polygons with low biomass.
The highest values of biomass appeared in the middle and southern taiga. The lowest values were in the northern taiga and the forest steppe (savanna) at the southern limit of forest distribution (figure 4). The frequency distribution of biomass classes in the forests of Russia was skewed toward forests with lower biomass, especially with the GLC2000 land-cover product (figure 4, histograms).

Discussion
The number and distribution of the 12 sites seem to have been adequate for capturing the spectral signatures of forests throughout Russia, as calculated by Euclidian distance between the MODIS spectral bands. Despite the good spectral coverage given by the sites, however, the models of biomass developed for individual sites never explained more than 45% of the variation in biomass at other sites and often explained less than 10% of the variation. When the combined model developed with data from all 12 sites was tested at individual sites, its explanatory power varied between 0.04 and 0.71.

Why were the satellite data so poor at explaining biomass at some sites?
The error of our estimates of biomass was ∼40%. In other words, 39% of the variability in predicted biomass was unexplained by the regression model. It is not surprising that satellite optical data were poor at distinguishing variations in biomass, especially in closed-canopy forests. We found that more of the variance in biomass was explained when we considered only young forests (<20 years) (presumably with open canopies), confirming that at least part of the difficulty in predicting biomass resulted from differences in biomass under closed canopies.
A second limitation to the prediction of above-ground biomass with MODIS data seems to have been the extent to which the forest polygon data matched the spatial resolution of MODIS cells. Table 1 ranks the test sites by the ability of the regression-tree approach to predict biomass at that site (column 3, r 2 ). The goodness-of-fit was roughly correlated with the average size of the forest polygons, although Karelia, Magadan, and Murmansk were exceptions.
It was not the number of polygons that limited the method's success. We used subsets of the polygons in the Krasnoyarsk Yart site to test whether the predictive capability of the regression trees was sensitive to the number as well as the size of training polygons. The relationship between observed and predicted biomass was robust with as little as 3% of the data.
The errors of the forest inventory data are uncertain. One crude estimate of error may be obtained from two recent estimates of Russian forest biomass. Alexeyev and Birdsey (1998) and Shvidenko and Nilsson (2003) used the same inventory data and the same forest area, yet reported total biomass of 56 Pg and 68.8 Pg, respectively (table 2). The difference (20% of the mean) is probably a conservative estimate of 'inventory' error because it pertains only to that part of the error related to allometry (calculation of biomass from wood volumes).
Other studies have used optical satellite data, calibrated with data from forest inventories, to determine biomass over large areas in temperate zone and boreal regions (Myneni et al 2001, Baccini et al 2004, Zhang and Kondragunta 2006, Muukkonen and Heiskanen 2007 and in the tropics (Foody et al 2001, Saatchi et al 2007. These studies suggest that our predictive model might have been improved (1) had we used finer spatial resolution data (e.g., ASTER data) for integrating ground measurement with the coarser resolution MODIS data  (2007) was improved when optical data were used in combination with radar data.

How do these results compare with other estimates?
We are unaware of other maps of forest biomass or growing stock for the forests of Russia. The first phase of the European SIBERIA project recently used synthetic aperture radar (SAR) data to estimate forest biomass over a 900 000 km 2 area in Central Siberia, but the approach failed to distinguish among biomass classes greater than 80 m 3 ha −1 (∼64 Mg biomass ha −1 ) (Gaveau et al 2003, Wagner et al 2003. According to our analysis, ∼60% of the forest area in Russia had biomass values greater than 64 Mg ha −1 . Despite the lack of biomass datasets at high spatial resolution, several studies have estimated the total carbon stocks in forests for broad administrative units. Alexeyev and Birdsey (1998), for example, reported the biomass of forests for Oblasts, Kray, or Republics (71 of them across Russia). When we summed the biomass values for all of the forested cells within these same units, our results gave remarkably similar totals (figure 5).
Our estimates of total forest biomass for all of Russia (46 and 67 Pg biomass for the MODIS and GLC2000 maps, respectively) include the recent estimate by Alexeyev and Birdsey (1998) (56 Pg), and our higher estimate is similar to that of Shvidenko and Nilsson (2003) (69 Pg) (table 2). These comparisons are not very satisfying, however, because total biomass is sensitive to forest area. Average forest biomass, in contrast, allows a better comparison, and our higher estimate (88.2 Mg ha −1 ) is similar to the estimate by Shvidenko and Nilsson (2003) (88.8 Pg).
It is important to recognize that the two estimates of total forest biomass we report result from hugely different estimates of forest area. The total area of Russian forests was 523.6 × 10 6 ha according to the MODIS land-cover product . However, neither estimate included the areas of forest contained in the mixed category of 'forest and agriculture', so both may be underestimates. The major difference between the two satellite-derived estimates was their treatment of woodlands and shrublands. Such lands are included as forests in the GLC2000 product and excluded in the MOD12Q1 product (figure 4). The difference is particularly conspicuous in the 'northern taiga' and includes our site in northern Murmansk. The Russian forest inventories recognize an intermediate area of forest in their classification (771 × 10 6 ha) (table 2).

Conclusion
The attempt to use MODIS data to distribute forest biomass across Russia was only partially successful. Positive aspects included the observations (1) that MODIS data and forest biomass were generally well correlated at sites where the forest polygons were larger than MODIS cells (500 m × 500 m); (2) that the spectral signatures from the 12 training sites selected in this study seemed to represent forests throughout the country; (3) that the map of forest biomass produced from this work appeared reasonable in terms of the distribution of biomass classes; and (4) that the total forest biomass for individual political units compared well with estimates based on data from the forest inventories of Russia.
The less successful aspects of the work included the observations (1) that MODIS data and forest biomass were not well correlated at sites where forest polygons were smaller than MODIS cells and even in some sites with large forest polygons; and (2) that predictive models of forest biomass developed at individual sites did not apply outside the borders of the training site. It is not news that optical data are insensitive to biomass under closed canopies. Nevertheless, MODIS data did capture gross differences in biomass across broad environmental gradients and across obvious differences in forest structure (for example, non-forests, open forests, young forests, and old forests). The maps of forest biomass obtained through this analysis will help constrain estimates of carbon emissions associated with changes in land use, fires, and other disturbances (Houghton 2005).
The difference between the maps of forest cover confirms the importance of determining biomass independently of land cover. Biomass is a continuous variable, with a wide range of values within each ecoregion or ecosystem. It is not well characterized by discrete classes of land cover or forest type. The capability of determining forest biomass from space would eliminate much of the arbitrariness of distinguishing forest from woodland, and of defining forests and deforestation for carbon accounting. Sources and sinks of carbon are the result of changes in biomass (carbon stocks), whatever the cause. They are better estimated by measuring changes in carbon stocks directly than by observing transitions across an arbitrary threshold of forest-non-forest.