Use of various remote sensing land cover products for plant functional type mapping over Siberia

High-latitude ecosystems play an important role in the global carbon cycle and in regulating the climate system and are presently undergoing rapid environmental change. Accurate land cover data sets are required to both document these changes as well as to provide land-surface information for benchmarking and initializing Earth system models. Earth system models also require specific land cover classification systems based on plant functional types (PFTs), rather than species or ecosystems, and so post-processing of existing land cover data is often required. This study compares over Siberia, multiple land cover data sets against one another and with auxiliary data to identify key uncertainties that contribute to variability in PFT classifications that would introduce errors in Earth system modeling. Land cover classification systems from GLC 2000, GlobCover 2005 and 2009, and MODIS collections 5 and 5.1 are first aggregated to a common legend, and then compared to high-resolution land cover classification systems, vegetation continuous fields (MODIS VCFs) and satellite-derived tree heights (to discriminate against sparse, shrub, and forest vegetation). The GlobCover data set, with a lower threshold for tree cover and taller tree heights and a better spatial resolution, tends to have better distributions of tree cover compared to high-resolution data. It has therefore been chosen to build new PFT maps for the ORCHIDEE land surface model at 1 km scale. Compared to the original PFT data set, the new PFT maps based on GlobCover 2005 and an updated cross-walking approach mainly di ffer in the characterization of forests and degree of tree cover. The partition of grasslands and bare soils now appears more realistic compared with ground truth data. This new vegetation map provides a framework for further development of new PFTs in the ORCHIDEE model like shrubs, lichens and mosses, to represent the water and carbon cycles in northern latitudes better. Updated land cover data sets are critical for improving and maintaining the relevance of Earth system models for assessing climate and human impacts on biogeochemistry and biophysics. The new PFT map at 5 km scale is available for download from the PANGAEA website at doi:10.1594/PANGAEA.810709.


Introduction
The Siberian region has been a focus of research attention in recent years because it is considered as a hot spot for climate change studies (see for example Lenton et al., 2008).The region is currently undergoing a warming trend with impacts already visible in the environment, its vegetation and soils (Lucht et al., 2002).Pronounced climatic warming in Siberia (Chapin III et al., 2005) has had large impli-cations on vegetation (Euskirchen et al., 2009), changes that have been already confirmed by numerous studies at various scales.For example, Tape et al. (2006) demonstrated, using aerial photography, an expansion of deciduous shrubs in tundra areas in northern Alaska during the last 50 yr.Satellite data sets and especially Normalized Difference Vegetation Index (NDVI) products have also documented landscapescale greening signals and/or phenological changes, in relation with air temperature (see for example, Forbes et al., Published by Copernicus Publications. C. Ottlé et al.: Use of various remote sensing land cover products for PFT mapping over Siberia 2010; Hüttich et al., 2007;Delbart et al., 2005;Delbart and Picard, 2007;Myneni et al., 2001).However, the response of continental-scale vegetation shifts due to climate warming is not simple because different processes and feedbacks linked to snow, permafrost, soil moisture, albedo, and species competition (Chapin III et al., 2005, Loranty andGoetz, 2012) lead to large uncertainties in predicting and attributing ecosystems and land cover change dynamics.
One approach to understand better the role of interacting processes and how the various species compete for water, light and nutrients is the use of ecosystem models.Ecosystem models are now able to represent the main high-latitude physical and biogeochemical processes and especially permafrost and snow modeling and vegetation interactions, as well as vegetation dynamics, but these models require a correct representation of current land coverage as initial conditions or for benchmarking dynamic global vegetation models (Quaife et al., 2008).
In northern Eurasia, the main challenge for ecosystem modelers is to be able to differentiate short-from highstatured vegetation, as well as deciduous from evergreen phenology.Even at this coarse thematic resolution, very different energy, water and carbon cycling processes are represented.For example, vegetation height is directly related to surface roughness and consequently affects turbulent fluxes; in addition, vegetation height can alter the effects of snow on ecosystem energy budgets with implications for surface albedo and related feedbacks.The deciduous character of shrubs or trees is also very important for the calculation of spring and autumn water and carbon fluxes and their seasonal variations.
Improved mapping of current land cover is a high priority for representation within Earth system models, yet there are several challenges that need to be considered.Remote sensing instruments provide regular data at global scales, with increasing spatial resolution, and have been used for years to map land cover.Thus, a number of global products have been derived over the last 20 yr.They are used for a wide range of environmental studies and especially in climate models to characterize the land surface and its physical and biogeochemical properties and to determine the energy and matter transfers to the atmosphere.In such models, for simplification, to reduce the computer time, and to develop testable hypotheses, the various ecosystems are grouped in plant functional types (PFTs), with a limited number of types, usually around 10 to 15.As an example, the ORCHIDEE dynamic global vegetation model (DGVM) (Krinner et al., 2005), part of the Institut Pierre Simon Laplace (IPSL) Earth system model (LMDZ, Hourdin et al., 2006;Dufresne et al., 2013), distinguishes 12 PFTs to represent the global land surface.Moreover, the reclassification in PFTs is done with constant, but qualitative, rules defined across climate zones (Poulter et al., 2011), which can lead to significant uncertainty in the class fractions.
The current ORCHIDEE PFT map is based both on the International Geosphere-Biosphere Programme (IGBP) 1 km global land cover map (Belward et al., 1999) reduced by a dominant-type method to 5 km spatial resolution, and on the Olson classification (96 types) (Olson et al., 1983).This spatial resolution is clearly not sufficient for future localscale studies focused on the environmental impacts of global warming and land use in Siberia and for development perspectives in terms of parameterization of biogeochemical processes.Therefore, our objective in this study is to develop a new map at 1 km resolution based on recent land cover products, suitable for Earth system modeling, which could be further refined if new PFTs are developed.
For that purpose, different remote sensing land cover products are available.They have been developed from multispectral and multitemporal imagery, in order to separate the various ecosystems presenting different spectral properties and seasonal variations.At medium resolution (hundreds of meters to kilometers), the most popular and most recent products are the GLC 2000 land cover database (Bartholomé and Belward, 2005) based on SPOT-4 VEGETATION instrument, the GlobCover land cover products (Arino et al., 2005(Arino et al., , 2012) ) derived from Envisat/MERIS radiometer and the MODIS land cover data sets (Friedl et al., 2002(Friedl et al., , 2010)), based on Terra and Aqua MODIS instruments.
These products have been compared in previous works and for some of them over Siberia.For example, Jung et al. (2006) developed the Synergetic land cover product (SYNMAP) dedicated to Earth system modeling, based on the merging of GLC 2000 and MODIS 4.0 products.The final map, which separates 48 classes, is available at 30 scale (∼ 1 km).Frey and Smith (2007) inter-compared AVHRR (Advanced Very High Resolution Radiometer) and MODIS products at 1 km scale in West Siberia and highlighted the weaknesses of global land cover (LC) products in northern wetland environments.Urban et al. (2010) focused on pan-Arctic land cover mapping and combined GlobCover, SYN-MAP, MODIS LC and vegetation continuous fields (VCFs) and additional regional products like fire products, to create a new harmonized map separating four classes: trees, shrubs, herbaceous and barren areas.Sulla-Meneshe et al. (2011) developed the Northern Eurasia Land Cover (NELC) database from supervised classification of MODIS data, which allows the separation of 15 land cover classes including land use (urban, agriculture), wetlands and tundra classes.Meanwhile, Pflugmacher et al. (2011) cross-compared GLC 2000, Glob-Cover and MODIS products as well as Landsat-based reference maps in northern Eurasia.The map legends were converted to a common classification on the basis of dominant life-form type (LFT).The results show regional disagreements among products and the difficulties to map shrubs and herbaceous vegetation types.More recently, Shepaschenko et al. ( 2011) produced a highly detailed land cover/land use data set for Russia essentially based on GLC 2000 data set at 1 km resolution combined with VCFs from MODIS, soil and vegetation databases and different inventories and statistics.
All these studies found significant differences between data sets and highlighted strengths and weaknesses of each product, but none concluded on the superiority of one compared to the other.Moreover, since for most of these works, the final objective was not PFT mapping, the methodology developed for the cross-comparison and the final mapping could not be used directly for our study.Further, no comparison to date has included the MODIS 5.1 product, which benefits from a reprocessing of the complete MODIS archive, with an up-to-date training database, and an extension of the land cover data to 2011.Therefore, it was necessary to develop a new comparison of the most recent land cover products available for Siberia, to build dedicated aggregation rules for ORCHIDEE PFT mapping and to generate a new PFT map over Siberia at 1 km scale.
This paper presents the methodology used to compare medium-resolution remote sensing land cover products for Siberia.The evaluation was performed after aggregating the different land cover data sets to the same spatial scale and under the same harmonized legend.A comparison of thematic differences was conducted to highlight areas of disagreement, and we developed a methodology to generate the PFT distributions.Our results are presented in terms of product comparison and final PFT mapping, with discrepancies explicitly addressed.

Methods
We acquired recent land cover satellite products available at medium spatial resolution (300 to 1000 m) and focused our comparison on Siberia.The data sets are presented in Table 1 with their specifications in terms of spatial resolution, time of acquisition, geographic projection, and thematic information, including the number of land cover classes with the respective classification legends in Table 2.The global products include the GLC 2000 product (Bartholomé and Belward, 2005) developed in northern Eurasia by Bartalev et al. (2003), the GlobCover 2005 and 2009 products (Bicheron et al., 2006;Arino et al., 2005Arino et al., , 2012) ) and the MODIS land cover type collection 5.0 and 5.1 (Friedl et al., 2002(Friedl et al., , 2010)).The first four products have been already compared and evaluated in various regions and at different scales, using ground truth measurements and have shown strengths and weaknesses (See and Fritz, 2006;Jung et al., 2006;Frey and Smith, 2007;Kaptué Tchuente et al., 2010, 2011;Pflugmacher et al., 2011;Schepaschenko et al., 2011).
Our first goal was to identify the most suitable product for further PFT mapping.To achieve this, we assessed the land cover classification methodology and land cover class definition in terms of their capacity to represent the spatial heterogeneity and in terms of the spatial agreement between products.To assess these criteria, we compared various data sets at different temporal and spatial scales, including highresolution optical images like Landsat-TM (Thematic Mapper) products.

Land cover products
The GLC 2000 land cover map was developed for different parts of the world with regional experts before applying a generalized legend to create a global land cover map.In this work, we used the regional product over northern Eurasia developed by the European Commission's Joint Research Centre and the Russian Academy of Science's Center for Forest Ecology and Productivity (Bartalev et al., 2003;Bartholomé and Belward, 2005).The land cover map was produced from daily observations provided by the SPOT-4 VEGETATION instrument for the year 2000, at 1/112 • ground sampling distance (GSD), corresponding to a ∼ 1 km spatial resolution.The automated classification process allows separating 22 land cover types based on local expert opinion following the land cover classification system (LCCS, Di Gregorio and Jensen, 1998) of the Food and Agriculture Organization (FAO).The map is available from the JRC Land Resource Management Unit website (http://bioval.jrc.ec.europa.eu), in equal area projection (plate carrée) with map datum WGS84.
The GlobCover land cover products (GlobCover 2005 and GlobCover 2009) were developed within the framework of European Space Agency (ESA) projects (Bicheron et al., 2006(Bicheron et al., , 2008;;Arino et al., 2005Arino et al., , 2012)).They are both based on Envisat/MERIS data acquired in years 2005 and 2009 respectively and available from the ESA GlobCover project website (http://ionia1.esrin.esa.int).The maps are available in plate carrée (WGS84) projection with a 300 m spatial resolution (1/360 • GSD), under the same class definition as GLC 2000 (i.e., LCCS but with a larger number of classes (40) for the regional product available for eastern Eurasia in 2005), whereas GlobCover 2009 product is available only with a global legend that separates 22 classes, fully compatible with the GLC 2000 one.GlobCover uses a fully automated unsupervised classification approach using GLC 2000 as the main auxiliary data set for the class interpretation.
Finally, the MODIS land cover products developed by the Boston University Department of Geography and Center for Remote Sensing (https://lpdaac.usgs.gov)are based on NASA MODIS instruments on board Aqua and Terra platforms.They are available at annual time step, from 2001 until 2007 for the C5.0 product (Friedl et al., 2002(Friedl et al., , 2010) ) with a spatial resolution of 500 m (1/240 Guide for the MODIS Land Cover Type Product, MCD12Q1 (which is available at http://www.bu.edu/lcsc/files/2012/08/MCD12Q1_user_guide.pdf).

Auxiliary data sets
Several auxiliary products representing different features of the land surface were used to assist in the evaluation of the global products (see Table 1).These products first helped us in the interpretation of the different legends and later assisted in the merging process, which permitted us to build the harmonized legend.Among them, two land cover maps based on aerial photointerpretation and ground truth data have been used to understand better the class significance and evaluate product accuracy and spatial variability representation.For these two products, the digital database was not accessible and only graphical maps were used.The first one is the Circumpolar Arctic Vegetation Map (CAVM, Walker et al., 2005), which was developed within the National Science Foundation Arctic Transitions in the Land-Atmosphere System (ATLAS) project, and is presently the most precise mapping of the Arctic tundra.It is available at 1 : 7 500 000 scale at http://www.geobotany.uaf.edu/cavm, in Lambert azimuthal projection, and separates 18 classes describing very precisely the various tundra ecosystems.This data set is based on photointerpretation by vegetation experts of nine Arctic regions, which allowed delineating the various biomes into a NOAA-AVHRR images database.
The second one is the Yakutsk region land cover mapping provided by A. Fedorov (personal communication, 2012), which was derived from Landsat images acquired in 2002 combined with ground truth data.In this map, 12 land cover classes were separated including six different types of forest like larch and birch in different states and three types of grasslands (alas, wet and dry valley meadows).These 2 maps have mostly been used in the following to evaluate the ability of the various land cover products to separate the shrubs/herbaceous classes and the broadleaf/needleleaf forests.
Lastly, the MODIS global VCFs (Hansen et al., 2003(Hansen et al., , 2006) ) and the forest canopy height map proposed by Simard et al. ( 2011) complemented all these data sets.The VCF product derived from MODIS sensors is provided at 500 m spatial resolution on an annual basis (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010) and at global scales.The VCF is a proportional estimate of vegetative cover types: woody vegetation, herbaceous vegetation, and bare ground.Collection 4 (version 3) was downloaded at http://glcf.umd.edu/data/vcf,where it is available Earth Syst.Sci.Data, 5, 331-348, 2013 www.earth-syst-sci-data.net/5/331/2013/  in the same projection as the land cover product.Finally, the forest height product based on 2005 data from the Geoscience Laser Altimeter System (GLAS) on board ICEsat (Ice, Cloud, and land Elevation Satellite) is available globally at 1 km spatial resolution and provides an estimation of the canopy height.These two last products provided an independent mapping of the forested areas and were mostly used in the PFT map generation.The data were obtained through the website http://lidarradar.jpl.nasa.gov in GeoTIFF format.

Harmonized legend approach
Because these land cover products did not have the same spatial resolution and, more importantly, did not use the same classification system, a harmonization procedure was developed.As already discussed by all the works dedicated to land cover map cross-comparison (to cite a few - See and Fritz, 2006;Frey and Smith, 2007;Urban et al., 2010;Sulla-Menashe et al., 2011;Pflugmacher et al., 2011;Kaptué Tchuente et al., 2011), the classification method, the original data, the number of thematic classes chosen, etc., can highly bias the classification results and the overall regional biogeographic characteristics.For example, GLC 2000 and GlobCover legends give more weight to the dominant tree species than to the density character, compared to MODIS.This is probably the result of the classification methodology applied for the MODIS product, which is based on the combined use of surface reflectance and land-surface temperature (LST), contrary to the other products, which use only surface reflectance.The addition of LST, which is known to be highly sensitive to vegetation fraction, could have increased the weight of the tree coverage in the class separation.Therefore, the LCCS legend used for GLC 2000 and GlobCover defines forest as greater than 15 % tree cover with trees defined as woody plants larger than 5 m, whereas IGBP (used in the MODIS product) defines forest as greater than 60 % tree cover with trees defined as woody plants larger than 2 m.Two other IGBP classes of eight and nine (woody savannas/savannas) are then used to represent more open canopies with the same height thresholds but different cover thresholds down to 10 %.In the same way, for shrublands, LCCS distinguishes between evergreen and deciduous species, whereas IGBP considers open and closed types.Further, for barren lands, IGBP merges bare and sparsely vegetated soils, when LCCS separates sparse herbaceous, sparse shrubs and bare areas.Given all these features, a comparison work could not be performed before having converted all the study products in a common legend.For our final purpose of water and carbon cycle modeling, this common classification requires the following: first, to be based on PFT features and, secondly, to discriminate trees, shrubs, water and barren as well as leaf type and senescence.This  11, 12, 13, 14, 15, 16, 20, 21 11, 14, 20 12, 14 16, 17, 18 choice leads us to merge the IGBP and LCCS classes under the 16 classes listed in Table 3, which are in close agreement with the GlobCover 2009 legend.The merging rules and the allocation of the ambiguous classes have been driven by the comparison of the spatial distribution of the land cover classes and the help of the auxiliary products, especially the high-resolution maps (CAVM map and the various Landsat images acquired in different subregions of Eurasia).Afterwards, in order to allow the comparison, all the data have been re-projected to the WGS84 plate carrée projection with square pixel size (1/112 • about 1 km scale), using a majority class criterion, since GLC 2000 and GlobCover are already available in this projection and grid.

Results
The various products have been compared over Siberia, with a focus on central and southwest Siberia (over Yakutia and around Omsk respectively, where high-resolution data were available).The coordinates of these two domains are 55-75 • N/104-163 • E and 50-58 • N/56-96.5 • E, respectively.These two regions were chosen because they cover almost all the variety of Siberian ecosystems.The first region, Yakutia (Sakha Federal Republic of Russia, capital Yakutsk), covers a large area of about 3 M km 2 , with 40 % above the Arctic Circle.The region is one of the coldest continental regions in the world (outside Antarctica) with large annual temperature amplitude varying between −60 and +40 • C. It is all covered by permafrost and mainly drained by the Lena River and its tributaries.The vegetation is driven by these extreme climate conditions, which limits the extent of Arctic tundra, composed of lichens and mosses in the north, and the taiga forest mostly composed of deciduous trees (especially larch) in the south.The other region studied is situ-ated in the southwest part of Siberia.It represents an area of about 2 M km 2 and is part of the Irtysh River catchment.The vegetation is mostly composed of croplands (wheat, barley, potatoes, etc.) and deciduous forests with the predominant species being larch in the north taiga and birch and aspen in the south.The maps clearly show the latitudinal distribution of tree cover, with forested areas between the mountains of Verkhoyansk and Chersky on the east side and Stanovoy in the south, sparse vegetation in the northern latitudes and bare soils in the mountainous areas.In this large region, the main land cover classes are deciduous needleleaf forest (mostly larch) covering the middle latitudes, and shrubs often mixed with forests and sparse vegetation.Although present in the five products, the fraction and spatial distributions of forest differ significantly among them.Table 4 presents the fraction of each land cover class for each study product.The fractions were calculated excluding the water pixels in order to avoid the Arctic Ocean pixels, which could have biased the statistics.In Table 4, values greater than 10 % have been highlighted in bold (i.e., excluding vegetation types not wellrepresented in this region).The main features are a larger representation of the sparse vegetation class in MODIS (50 % and 64 % for 5.0 and 5.1 products respectively) compared to 8 and 21 % in GLC 2000 and GlobCover 2009, to the detriment of deciduous needleleaf forest (28 % in MODIS 5.1 compared to 58 % in Glob-Cover 2009).This disagreement was already pointed out by Frey and Smith (2007) when they compared MODIS product to land cover field-based observations.It can be noted also that the shrub class is represented in the north of the region only in the GLC 2000 product and that regularly flooded areas are more represented in GlobCover and MODIS products compared to GLC 2000, especially in the Lena River delta.The spatial agreements are quantified in Table 5 for the main classes present in this region (i.e., deciduous needle-leaf, mixed forests, shrubs, sparse vegetation, herbaceous and bare soils).The statistics were calculated by comparing each product to the GlobCover 2005 one (values above 0.5 are highlighted in bold in the table, and statistics were not calculated for the classes presenting amounts less than 0.5 % like shrubs).As previously, the total agreement percentages do not account for water pixels, which could have biased the accuracy assessment.The results confirm the spatial comparisons: the best agreements are obtained for the two GlobCover products as expected, even though some classes like mixed forests or herbaceous present some discrepancies, probably linked to the low number of pixels involved since these classes represent about 1 % of the Glob-Cover 2005 map.The comparison with GLC 2000 shows that a lower scores are obtained for the same ambiguous classes, which probably present a larger heterogeneity and could suffer from the lower spatial resolution.The evaluation with MODIS maps displays worse statistics especially for shrubs, mixed forests and herbaceous vegetation, and the total agreement values are all lower than 0.5 except when two products of the same family are compared.A qualitative comparison was further performed with the CAVM product (Walker et al., 2005).The comparison of photointerpretations presenting continuous fields with pixel-based classifications has proven difficult, but in specific regions like in the Lena River delta, the different types of tundra appear better separated in the GLC 2000 and GlobCover 2005 products than in the other ones (not shown in this paper).

Comparison at finer scale in Yakutsk surroundings
In order to check the accuracy of the various products and the impact of the increased resolution of 300 m on the spatial representation of the land variability, we focused our comparisons around Yakutsk region taking advantage of the highresolution map provided by A. Fedorov's team (personal communication, 2012).This map is shown in Fig. 2 (bottom image).It shows clearly a precise delineation of the evergreen needleleaf forest (identified as "Pine" in light green color) and the mixed forest plots (in light blue ("Pine larch") and in pink ("Birch Larch")) within the dominant deciduous needleleaf ecosystem ("Larch" in middle green), particularly on the right bank of the Lena River.In Fig. 2, only the corresponding images for the GLC 2000 and GlobCover products exhibit this same spatial variability with evergreen needleleaf forests along the river banks in dark green and mixed forests in middle green, whereas the MODIS maps only show the deciduous needleleaf forest in light green.However, the lower spatial resolution of GLC 2000 does not allow representing correctly the riverbed and the flooded areas, contrary to the GlobCover 2005 product in which the herbaceous vegetation regularly flooded class along the riverbed is well characterized.Therefore, in that region, it seems that GlobCover 2005 better captures the main features of the land coverage.

Comparison in southwest Siberia
The second region where we focused our cross-comparison is the southwest part of Siberia where we are interested in analyzing the evolution of agriculture in future works.This region is mostly covered by croplands, deciduous broadleaf and needleleaf forests, and herbaceous and sparse vegetation.The grid cell fractions range between 0.3 and 0.43 for croplands, and are generally lower than 0.2 for the other cover types (Table 6).
The main difference among the LC products stems in the representation of the southern part of the region, covered by sparse and herbaceous vegetation in GlobCover and GLC 2000 maps, and by regularly flooded lands in MODIS.This area at the limit of Kazakhstan is drained by the Irtysh River and presents irrigated croplands, which could be identified wrongly as flooded areas.The main cropland area in purple in Fig. 3 is well delimited between the forested areas in the north and the sparse vegetated lands in the south.We also noted differences in the eastern part of the region, which is classified as croplands by MODIS and as sparse vegetation in the other products.This disagreement was analyzed more deeply by looking at this region with Landsat images.The views show undoubtedly that this region is covered by agricultural fields presenting an unambiguous spatial structure.The classification discrepancies between crop and sparse vegetation highlight the difficulties in separating such ecosystems if several dates per year are not used to assess the intra-seasonal variability, and if a pixel-based classification technique is used.An object-based classification methodology should have performed better in that case in the absence of higher spatial resolution imagery.Anyway, MODIS classification appears here to be more accurate for crop mapping.The spatial agreement statistics are presented for the main land cover classes in Table 7.The classes that appear in best agreement are the crops and the mixed forest (except for GLC 2000) with values larger than 0.5.The lowest values are obtained for sparse vegetation, which can be mixed with crops or herbaceous ecosystems.Moreover, the deciduous needleleaf and broadleaf forests appear to be better separated in the GlobCover products compared to MODIS and GLC 2000, where all the forested areas are grouped in the mixed forest class.Finally, the overall agreement percentages are very low, with values never exceeding 0.4 similar to what has been previously shown in Yakutia (except when products of the same family are compared).

Forest mapping accuracy
Our motivation being the development of PFT maps for land surface modeling, the characterization of biomes and the sep-aration of forested areas from shrublands are particularly important.Indeed, the vertical structure of forests implies different ground shading, aerodynamic and roughness properties, and consequently significant impacts on surface fluxes.Therefore, a product of forest canopy height like the one proposed by Simard et al. ( 2011) appears interesting for the interpretation of the land cover product legend as well as for its accuracy assessment.This recent product, based on lidar measurements, provides, at a global scale, the estimation of canopy height at 1 km resolution with an error evaluated against ground truth data of less than 6 m.For the comparison with land cover products, the forest classes were grouped together.Since the lidar product is based on 2005 data, and since the two GlobCover products are very similar, only the GlobCover 2005 map was included in the comparison.Figure 4   the extension of forested areas appears better represented in GlobCover 2005 if a threshold of 10 m height is imposed to delineate trees and shrubs.The agreement with the forest class was calculated for the four land cover products (MODIS5.0,MODIS5.1, GLC 2000 andGlobCover 2005).The spatial agreements obtained (43.5 %, 37.4 %, 57.9 % and 76.1 %) for the four products show clearly that GlobCover 2005 better captures the degree of woodiness at the land surface, which is an essential parameter for vegetation characterization in carbon and water cycle modeling.

Discussion
The results of the cross-comparison of the five land cover products studied, in two different regions of Siberia, show differences and similarities that can be explained either by the lack of resolution (for GLC 2000) or by the methodologies used to assess the class separation and interpretation.The agreement among the maps is highest in the zones that present more homogenous landscapes (for example inside the taiga region) and lower in the transition zones or in sparse ecosystems for which the class definition is determinant.The contribution of higher resolution products is therefore a significant improvement for discriminating vegetation types and for better mapping such regions.For our modeling purposes, since the main objective is to identify the type of ecosystem whatever its density (which will be anyway provided by the leaf area index (LAI) variable used as forcing or prognostically computed by the big-leaf type DGVM), the definition provided by GlobCover or GLC appears more valuable.Furthermore, this class definition allows delineating forested areas more precisely, as was demonstrated in the comparison with the recent forest height product.In addition, GlobCover 2005 provides a more precise legend (compared to Glob-Cover 2009, even if this level of detail is not ensured globally) and an increased spatial resolution (compared to GLC 2000).
For all these reasons, the GlobCover 2005 product was chosen as a basis for the PFT mapping, keeping in mind its class definition, especially the forest classes, which can include pixels with spatial coverage as low as 15 %.This definition more suitable for land cover type identification will require merging with other indices to account for vegetation density.Otherwise, it could lead to a likely overrepresentation of forests in transition zones with tundra in the northern latitudes and with herbaceous cover types in the south.

PFT mapping
Given the GlobCover 2005 land cover map, our next challenge is to define merging rules to build a PFT map for the ORCHIDEE dynamic global vegetation model (DGVM).For that purpose, we have followed the approach of Poulter et al. (2011) and associated with each GlobCover 2005 land cover class the corresponding classification in the OR-CHIDEE DGVM.In this work, we focused only on the PFTs present in Siberia.Given the previous studies and drawbacks highlighted in the GlobCover 2005 product, the reclassification rules proposed by Poulter et al. (2011) have been slightly modified and adjusted to boreal ecosystems, as described in Sect.4.2.

ORCHIDEE model
The ORCHIDEE land surface model is a mechanistic dynamic global vegetation model (Krinner et al., 2005) that is part of the IPSL Earth system model (Friedlingstein et al., 2006).It calculates the energy, momentum and hydrological budget of vegetation and soil and the entire carbon and nitrogen cycle in the different soil and vegetation pools.Photosynthesis, phenology, allocation of carbon and nitrogen into the different organs, plant growth and mortality, and decomposition of litter and soil organic matter, are derived from primitive equations that depend on vegetation characteristics.OR-CHIDEE is built on the concept of PFTs to describe vegetation distributions.Species with similar characteristics are regrouped together, and the model distinguishes 12 PFTs (tropical evergreen and deciduous forests, temperate broadleaf evergreen and deciduous forests, temperate needleleaf for-est, boreal needleleaf evergreen and deciduous forests, boreal broadleaf deciduous forest, natural C3 and C4 grassland, and C3 and C4 crops) plus bare soil.In its standard version, the PFTs are defined from two databases: the AVHRR IGBP 1 km global land cover map (Belward et al., 1999) and Olson et al. (1983) biome classification including 96 land types (Vérant et al., 2004).The final map prescribes the fraction of each vegetation type over a resolution cell of 5 km.Therefore, different PFTs can coexist in every grid element, and their fraction can vary when the dynamic vegetation submodule is activated.Figure 5 presents the standard PFT maps used in ORCHIDEE.
In Siberia, nine PFTs are present in the standard land cover map.They include four types of forests (temperate and boreal needleleaf evergreen, needleleaf summergreen, broadleaf summergreen forests), C3 grass, C3 crops, unlikely (but very few) C4 crops and bare soils.2011) was slightly modified to map the boreal PFTs better, based on the ancillary satellite data and considering the LCCS legend for regional land cover types.In particular, the fractions of forested PFTs in the open forest classes were decreased to account for the lower density of forests in boreal regions, as well as the fractions of bare soil in the sparse vegetation classes to account for mosses in tundra environments.Moreover, because of the absence of the boreal broadleaf evergreen class in the ORCHIDEE PFT classification, this class has been equally distributed between the broadleaf summergreen and the needleleaf evergreen PFT classes.For the same reasons, the lichens were merged with C3 grasses and the shrublands were spread among the C3 grass, the forests and bare soil classes.Further, the percentages of forests, bare soils and grasslands (only C3 in boreal zones) were adjusted with the support of the MODIS VCF products.These data, indeed, permit assessing the land surface heterogeneity and the amount of vegetation inside the pixels.For example, Montesano et al. (2009), in their evaluation of the VCF product in the circumpolar taiga-tundra transition zone, showed the contribution of these continuous fields to capture the forest cover variability and the spatial heterogeneity, especially in land cover transition zones.Therefore, the VCF data for the year 2005 have been upscaled at 1 km scale, and the fractions of trees, grasslands and bare soil have been extracted and averaged for each Glob-Cover 2005 class.The results permitted a better understanding of the LCCS legend, and the PFT reclassification was then performed according to the new merging rules described in Table 8.

ORCHIDEE PFTs
The new PFT maps have been generated from the Glob-Cover 2005 data set, keeping the benefit of the high Earth Syst.Sci.Data, 5, 331-348, 2013 www.earth-syst-sci-data.net/5/331/2013/This index, which is 0 for full agreement and √ 2 for full disagreement, is displayed in Fig. 8.The agreement is best in the northern latitudes and worse in the center of Siberia where the fractions of grasslands and forested PFTs have been more modified.

Discussion and conclusion
Land cover mapping is crucial for many environmental studies, and the re-gathering in PFT classes is necessary for the specific purposes of land surface modeling.In this study focused on Siberia, we compared five medium-resolution land cover maps derived from remote sensing and highlighted some discrepancies mostly linked to the legend definition adopted.The strengths and weaknesses of each product were shown, and the results led us to choose the GlobCover 2005 product because of its highest spatial resolution and more detailed legend.Therefore, a new PFT map at 1 km scale over Siberia has been generated for the ORCHIDEE DGVM.This map shows large differences compared to the standard maps in the differentiation of broadleaf and needleleaf forests and in the representation of the landscape heterogeneity.The fractions of the various ecosystems are smoothed and seem to represent the vegetation diversity better, thanks to the use of higher resolution data sets for the PFT mapping.
These differences should significantly impact the DGVM simulations.Indeed, PFT fractions are used to define the vegetation characteristics in terms of photosynthesis capacity, phenology, roughness, etc.All these properties are determinant for the calculation of the water and carbon fluxes, especially the evapotranspiration and the gross primary production fluxes.Consequently, such modifications should impact the biosphere-atmosphere exchanges and will be analyzed in further works.
This study also showed the difficulties in linking vegetation classes to a limited number of PFTs, constrained by global modeling and time computing issues.The absence of a shrub PFT and the solution to distribute the shrub classes among grasslands, bare soils and forests is not satisfactory.Such vegetation types have such different properties that it appears difficult to represent well the energy and mass Earth Syst.Sci.Data, 5, 331-348, 2013 www.earth-syst-sci-data.net/5/331/2013/ transfers with an aggregation of such variability.In the same way, moss-and lichen-dominant ecosystems are not represented in the final PFT map and are assimilated to bare soils, the same for regularly flooded areas and peatlands that have been spread between the grasslands and water classes, which in terms of carbon cycle could lead to significant errors.Therefore, the development of new PFT classes in OR-CHIDEE, to represent better these specific northern ecosystems, appears to be a priority if one wants to represent boreal ecosystems and their future evolution correctly.

Figure 1
Figure 1 presents the land cover maps extracted from the five global products (GLC 2000, GlobCover 2005 and 2009, MODIS 5.0 and 5.1) at 1 km scale under the harmonized legend discussed previously.The five products were compared considering the representative time period.Thus, Glob-Cover 2005 was compared with MODIS 2005 data set, Glob-Cover 2009 with MODIS 2009 product and GLC 2000 with MODIS 2001.The maps clearly show the latitudinal distribution of tree cover, with forested areas between the mountains of Verkhoyansk and Chersky on the east side and Stanovoy in the south, sparse vegetation in the northern latitudes and bare soils in the mountainous areas.In this large region, the main land cover classes are deciduous needleleaf forest (mostly larch) covering the middle latitudes, and shrubs often mixed with forests and sparse vegetation.Although present in the five products, the fraction and spatial distributions of forest differ significantly among them.Table4presents the fraction of each land cover class for each study product.The fractions were calculated excluding the water pixels in order to avoid the Arctic Ocean pixels, which could have biased the statistics.In Table4, values greater than 10 % have been highlighted in bold (i.e., excluding vegetation types not wellrepresented in this region).

2 Figure 2 .
Figure 2 presents the comparison of the current forest height product with GLC 2000, GlobCover 2005 and the two MODIS products extracted for the year 2005.Qualitatively,

Table 1 .
List of the land cover products used and their characteristics.

Table 2 .
Aligning the legends of global maps: dominant life-form type (LFT) and corresponding land cover classes from GLC 2000, GlobCover and MODIS IGBP.

Table 3 .
Harmonized legend used and correspondence with original product classes.

Table 5 .
Agreement percentages in Yakutia (for the main classes and comparison of GlobCover 2005 with GlobCover 2009, GLC 2000, MODIS 5.0, MODIS 5.1), values larger than 0.5 appear in bold.

Table 6 .
Fraction of each class in each product for South Siberia, values larger than 0.1 appear in bold.

Table 7 .
Agreement percentages in South Siberia (for the main classes and comparison of Globcover2005 with Globcover2009, GLC2000, MODIS 5.0, MODIS 5.1), values larger than 0.5 appear in bold.
Table 8 presents the merging rules that have been defined to reclassify the GlobCover 2005 classes present in Siberia, into the ORCHIDEE PFTs.The original crosswalk table from Poulter et al. (

Table 8 .
Merging rules from GlobCover classes to ORCHIDEE PFTs.