REVISITING THE DEMOCRATIC REPUBLIC OF THE CONGO STRATIFICATION MAP FOR THE YEAR 2000 USING CLOUD-BASED COMPOSITING AND OBJECT-BASED CLASSIFICATION ALGORITHMS

National stratification maps are essential to improve forest management systems. For the Democratic Republic of the Congo, the existing maps derived from remote sensing techniques do not allow an optimal representation of the diverse land cover classes constituting the national stratification scheme. This situation is inherent to the cloud persistence, the seasonality effects and the spatial resolution of the input satellite imagery used that is not always adequate for the discrimination of certain land cover classes. This paper explores a cloud-based median luminance best pixel approach to obtain a cloud-free mosaic of optimal quality. The mosaic produced has necessitated nearly 2,500 Landsat scenes and a following object-based classification enabled the generation of a stratification map for the year 2000 according to the national stratification theme. A stratified random sampling approach that required 1,141 reference samples allowed estimating the map accuracy at 79.32%. Land cover classes areas computed using standard good practices recommendations to estimate land areas indicated that the dense moist forest area was about 158,810,975 ± 7,460,671 ha representing 68.40% ± 3.21% of the country area. Thanks to the free, user-friendly and cloud-based platforms for satellite images processing, the methodology implemented is easily replicable for other tropical countries.

National stratification maps are essential to improve forest management systems. For the Democratic Republic of the Congo, the existing maps derived from remote sensing techniques do not allow an optimal representation of the diverse land cover classes constituting the national stratification scheme. This situation is inherent to the cloud persistence, the seasonality effects and the spatial resolution of the input satellite imagery used that is not always adequate for the discrimination of certain land cover classes. This paper explores a cloud-based median luminance best pixel approach to obtain a cloud-free mosaic of optimal quality. The mosaic produced has necessitated nearly 2,500 Landsat scenes and a following object-based classification enabled the generation of a stratification map for the year 2000 according to the national stratification theme. A stratified random sampling approach that required 1,141 reference samples allowed estimating the map accuracy at 79.32%. Land cover classes areas computed using standard good practices recommendations to estimate land areas indicated that the dense moist forest area was about 158,810,975 ± 7,460,671 ha representing 68.40% ± 3.21% of the country area. Thanks to the free, user-friendly and cloud-based platforms for satellite images processing, the methodology implemented is easily replicable for other tropical countries.

…………………………………………………………………………………………………….... Introduction:-
The forests of the Democratic Republic of Congo (DRC) represent around 60% of the overall Congo Basin forest estate and play an important role in the sequestration of atmospheric CO2, thus contributing to balancing the flow of global greenhouse gas emissions [1,2]. Monitoring the dynamics of Congolese forests is therefore of importance, particularly since the advent of the REDD+ mechanism [2,3]. In order to progress in meeting the requirements of the Warsaw Framework for REDD+, the DRC recently submitted to the United Nations Framework Convention on Climate Change (UNFCCC) its first Forest Reference Emission Level (FREL) covering the period 2000-2014 [4].
The year 2000 was chosen as the base year of the DRC's FREL as it corresponds to the reference year for several map products at the national level [4].
Recently, [12] have developed a national stratification map for the year 2000 at 30 m spatial resolution. Although at this spatial resolution it is theoretically possible to discriminate the different land cover classes constituting the DRC's forest classification system, this map only represents two general classes (i.e., forest and non-forest) and therefore cannot meet the requirement of obtaining a homogeneous and detailed cartography of the national stratification scheme.
In turn, cloud-based geo-spatial data display and processing platforms, such as Google Earth (GE), Collect Earth (CE), Google Earth Engine (GEE), and System for Earth observation, data access, Processing, Analysis for Land monitoring (SEPAL) [13], have significantly reduced the time required to process spatial data for deriving homogeneous and detailed vegetation maps [14]. In fact, those free, user-friendly and cloud-based platforms for satellite images processing make the use of remote sensing techniques more accessible to a larger number of practitioners. SEPAL for instance, is a cloud-based supercomputer platform that is part of the Openforis tools such as Collect Earth [15]. SEPAL provides access to and processing of historical and recent satellite images archives of the USGS (United States Geological Survey) Landsat (L4-5, L7 and L8) and ESA (European Space Agency) Copernicus (Sentinel-1 and 2) programs. CE for its part is a tool for querying high and very high spatial resolution satellite images and collecting various data through the GE archives in conjunction with Bing Maps and GEE [16].
There is therefore a need to demonstrate how cloud-based geo-spatial processing algorithms can be routinely used to improve capacities in producing reliable classification maps that reflect national stratification schemes. Such demonstration is of paramount importance in developing countries where computation capacities to process a large amount of satellite imagery are still low and where adequate infrastructures and standard operating protocols to produce consistent land cover maps trough time are still lacking.
The objective of the present research is to use these cloud-based platforms to produce a DRC-wide stratification map for the reference year 2000 that discriminates the land cover classes as described in the country's national stratification scheme. The specific objectives of the research are threefold: (1) using the best median pixel approach to produce an annual composite which excludes the seasonality effects and the lack of data (i.e., clouds, poor quality image, atmospheric haze, etc.); (2) conducting an object-oriented supervised classification for producing a historical map of the year 2000 with an improved spatial discrimination of the vegetation types described in the national stratification scheme; and (3) generating reliable land cover classes areas at the national scale. In addition, the approach developed in the research is meant to be straightforward so that it can be easily replicated for other tropical countries that are still facing challenges in producing reference maps that comply with national stratification schemes.

Methodology:-
The Figure 1 summarizes the methodology consisting of five steps that are: (1) delineation of blocks; (2) generation of mosaics; (3) segmentation and calibration of classification models; (4) object-based classification and (5) evaluation of results including the calculation of the number of reference samples to be collected and the production of land cover area statistics. These different stepsare detailed below.

Delineation of blocks
The DRC's geographic location straddling the Equator causes an inversion of the seasonality between the North and South, making it difficult to map simultaneously the totality of country ecosystems using satellite-based remote sensing techniques [11]. To circumvent that issue, the country was divided into three blocks ( Figure 2) to stratify the seasonality. The northern block (in blue) corresponds to the area covered by the Sudanian and Sudano-Guinean savannah, while the central (in orange) and southern (in red) blocks correspond respectively to the areas of dense humid forest and Zambézian savannah [2].
931 Figure 2:-Delimitation of DRC into three seasonal blocks (blue, orange and red).

Mosaics generation
The collection of Landsat scenes was conducted per block according to specific Julian Days (JD) to generate per block mosaics within the SEPAL platform [13]. The selection of images was constrained with a maximum threshold of 10% cloud cover in order to generate a high quality and cloud-free composite for the year 2000. To achieve the could-free composite objective, the strategy consisted in increasing the number of scenes collected per block and  (Table 1). Segmentation and calibration of classification models Multi-resolution segmentation by region [18] was used to partition the mosaics of each block into homogeneous segments [19][20][21][22]. These segments were constructed by merging the spectral and spatial information of adjacent pixels using the MIR, NIR and R Landsat bands [23,24,21]. The segmentation was carried out under the constraint of obtaining segments or objects whose sizes are finer (here referred as S-N1 segments of about 3,41 ± 4,80 hectares) in heterogeneous landscapes and larger (i.e., S-N2 segments of about 22,40 ± 22,41 hectares) in homogeneous ones [25,21,26,22]. A S-N2 segment corresponds to an aggregation of several S-N1 segments and represents a dense and spatially homogeneous vegetation cover. The aggregation of homogeneous segments was considered to reduce the time required to calibrate the classification model.
The classification model calibration for dense and homogeneous landscapes of the central block ( Figure 3) and certain areas of the southern block was carried out based on S-N2 segments from which so-called reference segments (RS) within each block were extracted. The RSs were visually identified within each block on landscapes that are characteristic of a specific targeted vegetation. The calibration thus consisted in assigning the type of vegetation corresponding to the RSs considering the following information as inputs for decision-making: (i) expert knowledge from the authors, (ii) NDVI (Normalized Difference Vegetation Index) indices [27] (equation 1) and (iii) NDWI (Normalized Difference Water Index) [28] (equation 2). These two indices are commonly used in remote sensing respectively for their sensitivity to characterize vegetation cover and water availability [29,30]. They have already been successfully applied in the characterization of certain types of vegetation in DRC, notably edaphic forest and grassland classes [11].
NDVI= (ρNIR-ρR)/(ρNIR+ ρR ), (1) with :ƿNIR : luminance of the Near Infrared spectral band and ƿR : luminance of the Red spectral band NDWI= (ρNIR-ρMIR)/(ρNIR+ ρMIR ), (2) with :ƿNIR : luminance of the Near Infrared spectral band and ƿMIR : luminance of the Medium Infrared spectral band 933 For landscapes that are less dense and more homogeneous, the calibration of the classification model was carried out at the S-N1 segments level, especially in the northern and southern blocks.
In addition to parameters of homogeneity and density, other information available in the scientific literature and in the country's national stratification standards [8] were considered to distinguish certain types of vegetation in the RSs. These include canopy texture (i.e. rough or coarse vs. smooth) [10,11] which was used to improve the calibration of the classification model, particularly in the case of certain highly heterogeneous landscapes in the northern and southern blocks.

Object-based classification:-
A supervised object-based classification [31] using the nearest neighbor maximum likelihood algorithm was applied to derive vegetation types by block. This algorithm has proven to be efficient for mapping vegetation types [32,33] and the classification approach preferred has the distinct advantage of requiring only a few set of training samples per vegetation types within each block.
The labeled objects (segments or polygons) resulting from the classification process were subsequently converted to a raster format at a spatial resolution of 30 m, equivalent to the Landsat spatial resolution, to obtain the stratification map of the whole country.

Results Evaluation:-
The evaluation of the classification output was carried out through a stratified random sampling design of reference samples throughout the whole DRC.

Number of reference samples;
Equation (3)  The n samples were then allocated to different classes according to each class area and a minimum of 100 samples was allocated to any minority class [36].

Interpretation of reference samples;
Reference samples were visually interpreted on Collect Earth (CE) [16] through a squared evaluation unit of 0.09 ha [37] corresponding to the spatial resolution of the stratification map. The interpretation was mainly carried out on images from the year 2000 and in priority on any high-spatial resolution satellite images when available [38]. In case where images for the year 2000 were missing, images acquired around this year were used (Table 1).
To guarantee the independence of reference samples, their interpretation was carried out regardless of the class predicted by the automated classification process (Section 2.2.4). Furthermore, the assignment of the vegetation class corresponding to each reference sample was based on the majority rule within each evaluation unit. The comparison of reference samples with the results from the automated classification led to a confusion matrix that was used to calculate common map accuracy indices that are typically the user accuracy, the producer accuracy, the overall accuracy and the Kappa index [39,36].

Areas calculation;
Two approaches were used for calculating the areas of different classes: pixel count and statistical estimation. The pixel count was conducted at the provincial level and summarized at the national level. The statistical estimation of class' areas and the associated errors were performed using SEPAL's Stratified Area Estimator Analysis module [13], following the good practices approach for estimating land areas proposed by [35].
Because the forest class is of major interest in particular for operationalizing the REDD+ mechanism at the national level, the final stratification map was subsequently aggregated into two major classes namely Forest (i.e.,  Figure 4 shows the mosaic for the year 2000 at 30 m spatial resolution, resulting from the aggregation of three blocks (i.e., north, center and south)mosaics using a total of 2,492 Landsat scenes. Despite the large number of Landsat scenes required, the mosaic necessitated very little memory space allowing its easy display and processing on standard computers that are commonly available in most of developing country remote sensing national institutions. The mosaic covers homogeneously the entire country thanks to the median luminance best pixel mosaicking process. Common cloud persistence in the southwestern was significantly reduced so that remaining clouds on the mosaic represent only 0.2% (No Data in Table 2) of the country area.  Table 2 below shows pixel count-based areas of different classes by province and their relative proportions at both national and provincial levels.    The dense moist forest on land is the predominant country vegetation with an area equivalent to 99,661,380 ha, which is close to 43% of the national territory. This class is mainly present in the provinces of Tshopo, Tshuapa, Bas-Uelé, Sankuru, Maniema and Mai Ndombe. The edaphic forest class is mainly present in the provinces of Equateur, Mai Ndombe and Tshuapa and remains flooded almost all year, and includes the extent of DRC peatlands [40]. The secondary forest for its part results from the anthropogenic pressure and is generally found along the main roads, close to farming communities, rural agglomerations and urban centres [11].

Landsat mosaic
The Close to open deciduous woodland is a combination of Close to open deciduous forest with an open canopy and as an undergrowthgrassland layer. It is also called "Miombo forest" which is found in the south-east and south-west of the country, respectively in the Lualaba provinces, Haut-Katanga and Kwango. Submontane forest and mountain forest are found exclusively in the eastern provinces of the country, mainly North and South Kivu. The grassland is one of the most represented vegetation in the north and south of the central forest massif as well as in the west of the country in the Kongo-Central province.
Finally, Woodland and Shrublandwere aggregated into a single class due to the lack of auxiliary information, particularly the respective ranges of canopy heights, which could improve their discrimination. These two types of savannas form a buffer zone at the frontier between dense moist forest and grasslands that are predominant in the Kasai Oriental province. Table 3 below presents the results of the statistical area estimation and the associated errors for each class. The Forest 2000 class was estimated at 158,810,975 ± 7,460,671 ha representing about 68.40% ± 3.21% of the national area. The area estimate of the Urban & Bare areas class is significantly different from that obtained using the pixel count method, with a relative error of about 72.42%. This difference is due to omission errors, notably 4 samples of Urban & Bare areas class on the Vgt DRC 2000 map that corresponded in fact to secondary forest (1 sample) and grassland (3 samples) according to the visual interpretation of reference points.

Accuracy assessment
The confusion matrix shown in Table 4 originates from the cross-comparison of the DRC 2000 Vgt map and the visual interpretation of reference samples. The Vgt DRC 2000 map has an overall accuracy estimated at 79.32%, with a Kappa index of 0.78. The Mountain forest and Dense moist forest classes have the highest user accuracies respectively of 92 and 90% whereas the Woodland &Shrubland stratum has the lowest user accuracy of about 63%. 941 provides information on the horizontal physiognomy of the landscape, particularly the degree to which the canopy is open. The horizontal physiognomy was a key parameter for the discrimination between dense and open forests.
The vertical physiognomy was characterized by canopy texture based on the shape of tree crowns. It was used to distinguish edaphic forest from dense moist forest or even secondary forest. Indeed, the smooth texture which is specific to the edaphic forest is due to the persistence of water in the undergrowth throughout the year whereas the canopy texture of dense moist forest is rougher.
Altitude, also used by several authors including [11,7], made it possible to discriminate between low and high altitude vegetations, particularly mountain and submontane forests.

Minimum number of samples per land cover class
The accuracy assessment of the VgtDRC 2000 map was carried out according to good practices recommendations from [35]. These practices relate to three main elements: sampling design, response design and analysis of reference data.
The most used sampling methods are the systematic and random stratified samplings [43]. The first method selects the samples based on a regular spatial distance while for the second one the samples are selected randomly either equally or proportionally to the land cover classes areas on the map [44]. The latter was used in this research even though it generates a few samples in the land cover classes represented in small proportions [43]. To address the potential under-representativeness of samples in land cover classes covering small areas, [36] proposed allocating between 20-100 samples to these classes. Thus, for the VgtDRC 2000 map, 100 samples were allocated to each class considered as a minority class, namely mountain and submontane forests, mangroves and water bodies.

Mapping evaluation and products comparison
The conversion of the object-based stratification map to a raster format resulted in a final product at 30 m spatial resolution. This conversion did not induce any significant loss of information in terms of each class area (Table 6). However, the area estimates comparison between the pixel count and the statistical approaches showed significant differences for the classes Woodland &Shrubland and Urban & Bare areas (Table 3). These differences are mainly tributary of the 30 m spatial resolution of the final map that is still too coarse to allow discriminating tree and shrub savannas at the frontier with dense moist forest. The same applies to the discrimination between the Urban & Bare areas and the Grassland class, especially in provinces where the latter class is predominant (i.e., Kasai-Oriental, Lomami, Haut-Lomami, etc. see Table 2). Furthermore, the low precision found for Woodland &Shrubland in the confusion matrix (Table 4) is also a direct consequence of the low level of discrimination for these two classes. However, with all the savannas aggregated into a single class, the pixel count and statistical area estimates for that 942 class are much closer, leading to about 14% decrease of the map overall relative error from 4.32% (Table 3) to 3.71% (Table 7), which is significant.  The visual comparison illustrated in Figure 6 shows that the map Vgt DRC 2000 ( Figure 6, A-1) discriminates the secondary forest intrusions in the core of the dense moist forest at the difference of the map produced by [11] ( Figure 6, B-1). The main reason of that difference being that the latter map's coarser spatial resolution does not allow to distinguishing detailed land cover shapes. The Vgt DRC 2000 map is in turn very similar to the map produced by [3] (Figure 6, C-1) that is at 60-m spatial resolution. The same similarity is observed with the [4] ( Figure 6, D-1) which is at the same spatial resolution as the Vgt DRC 2000 even though the latter assimilates secondary forest to a non-forest area, unlike the three other maps. The comparison of pixel count-based areas between the four maps also shows some discrepancies for the major land cover class that is the dense humid forest here considered as the aggregation of all forest classes, i.e. dense moist forest, edaphic forest, secondary forest, close to open deciduous woodland (Miombo), submontane forest and mountain forest (Table 8). 1aggregation of the following classes: edaphic forest, dense moist forest, old secondary forest, young secondary forest, rural complex, close to open deciduous woodland (Miombo). 2 aggregation of the following classes: dense forest and forest/other mosaic.
The area of the dense humid forest class for the year 2000 is very close when comparing the map Vgt DRC 2000 and the map produced by [3], whereas the two products result from two different methodological approaches, notably an object-based classification for the first and a pixel-based approach for the second. Such finding was also corroborated by [45] who observed that these two classification approaches lead to similar area estimates in homogeneous and poorly fragmented landscapes such as the dense moist forest.

Conclusions:-
This paper proposed an alternative methodology to produce a 30 m DRC-wide stratification map according to the land cover classes defining the national stratification scheme as described by [8]. This research showed that the median luminance best pixel approach allowed to obtain a cloud-free mosaic of optimal quality and which minimizes the seasonality effects. The mosaic has also proven to require very little memory space on standard 944 computers whereas the time series used comprised nearly 2,500 Landsat scenes. The areas of the different land cover classes and associated errors were estimated according to the good practices recommendations to estimate land areas [35] based on a set of 1,141 reference points spread over the entire national territory using a stratified random sampling. The precision indices indicated that the map was indeed representative of the different classes' areas for the year 2000, particularly the dense moist forest which is of paramount interest. The area of this class was therefore estimated at 160,684,890 ha (pixel count-based area) and 158,810,975 ± 7,460,671 ha (area statistical estimation), i.e. approximately 68.40% ± 3.21% of the country area. Thanks to the use of free, user-friendly and cloud-based platforms for satellite images processing,the methodology implemented in the research can be replicated in other Congo basin countries where stratification maps are still an issue in order to fully operationalize the REDD+ mechanism and for other forest management purposes.