Harmonized Pan-European Time Series for Monitoring Soil Sealing

: The European Copernicus Land Monitoring Service (CLMS) has been producing datasets on imperviousness every 3 years since 2006. However, for 2018, the input for the production of the imperviousness dataset was switched from mixed inputs to the Sentinel constellation. While this led to an improvement in the spatial detail from 20 m to 10 m, this also resulted in a break in the time series as the 2018 update was not comparable to the previous reference years. In addition, the European CLMS has been producing a new dataset from 2018 onward entitled CLC+ Backbone, which also includes a sealed area thematic class. When comparing both datasets with sampled reference data, it appears that the imperviousness dataset substantially underestimates sealed areas at the European level. However, the CLC+ dataset is only available from 2018 and currently does not include any change layer. To address these issues, a harmonized continental soil sealing combined dataset for Europe was produced for the entire observation period. This new dataset has been validated to be the best current dataset for monitoring soil sealing as a direct input for European policies with an estimated total sealed area of 175,664 km 2 over Europe and an increase in sealed areas of 1297 km 2 or 0.7% between 2015 and 2018, which is comparable to previous time periods. Finally, recommendations for future updates and the validation of imperviousness degree geospatial products are given.


Introduction
Soil sealing is part of the land take process, i.e., when natural or agricultural land is converted to settlements, infrastructure and commercial or industrial use.The process covers the ground by an impermeable material such as asphalt or concrete.Imperviousness is the state of soil where it cannot be penetrated by air and water and in practice soil sealing and imperviousness are terms used synonymously.However, while soil sealing indicates an anthropogenic process, imperviousness also refers to natural impenetrable surfaces like rocks and glaciers.Hence, all sealed surfaces are impervious, but some impervious surfaces are not sealed.
Soil sealing often affects fertile agricultural land, and as sealed areas cannot be penetrated by air and water, the process puts biodiversity at risk.Soil sealing also increases the risk of flooding in urban areas in case of flash floods and it creates urban heat islands, which impacts human health.European policy making urges data providers to monitor soil sealing to be informed on development, agriculture, environment or sustainability at multiple levels of government.Given the importance of soils to the European Union's (EU) policy objectives on climate change mitigation and adaptation, food security and biodiversity, soils contribute to achieving the main objectives of the European Green Deal, as set out in the EU Soil Strategy for 2030.To support efforts to improve soil health, the European Commission has undertaken two key initiatives.Firstly, they established a mission on soil that utilizes a network of living laboratories to test solutions for improving soil.Secondly, in 2023, the commission proposed a Soil Monitoring Law.This law mandates monitoring of land take and soil sealing indicators across the EU [1].While not directly regulating soil sealing itself, the data collected through this monitoring will be crucial for developing informed policies to address this issue.
Several regularly updated datasets based on land monitoring are available to measure imperviousness at the European scale.They are based on in situ monitoring, remote sensing or a combination of both.The Land Use/Cover Area frame Survey (LUCAS) is an in situ land monitoring program [2] that provides harmonized and comparable statistics on land use and land cover across the whole of the EU's territory.It provides every 3 years data on land cover and land use through some 300,000 field observations sampled from a 2 by 2 km grid covering the EU.However, LUCAS does not cover European Environment Agency (EEA) cooperating countries.In addition, the definition of the artificial thematic class in LUCAS is only partially related to sealed surfaces and there are too few LUCAS points over urban areas to make accurate estimates of changes in soil sealing.
Datasets based on remote sensing are available from the Copernicus Land Monitoring Service (CLMS) and include the CLMS CORINE Land Cover + Backbone (CLC+ BB) for 2018 [3] (2021; available at end of 2024) and the time series of the imperviousness degree layer [4], every 3 years from 2006 to 2018 (2021; available at end of 2024).Other global data sources exist, such as the Global Human Settlement Layer (GHSL) described in [5] and the World Settlement Footprint (WSF) [6].However, both GHSL and WSF are global products and their respective definitions are more focused on assessing built-up areas, with a focus on buildings, whereas roads or parking spaces are not considered when disaggregating population census data into a gridded product.
There is a natural tendency from the remote sensing user community to extract area statistics from such spatial datasets (i.e., "pixel counting") from earth observation (EO)based geospatial products to produce statistical indicators for various purposes as discussed in [7].However, geospatial map products suffer from misclassification errors and "pixel counting" can be strongly biased unless the accuracy of such map products reaches a level when these misclassification errors can be considered negligible.There has been a considerable effort in the remote sensing community to assess the accuracy of map products against reference data to ensure that the maps could reach a sufficient level of accuracy.Despite substantial advances in this topic in the scientific literature in recent years notably with paper [8], this has yet to be fully implemented in operational projects.In addition, even if map accuracy assessment is performed correctly, high accuracy does not necessarily mean that area statistics can be directly extracted from a map as highlighted in [9].This happens because the requirement level is not the same for mapping accuracy and for statistics.It may happen that an 80% mapping accuracy is considered good enough for a thematic map, but this may lead to up to an unacceptable 20% bias in the area estimates if commission and omission errors are strongly unbalanced with commission errors defined as areas classified as the theme of interest but should be classified as something else and omission errors defined as areas that should be classified as the theme of interest but were classified as something else.One way to address this is to apply a so-called Model Assisted Regression or Regression estimator as detailed in [10], which combines reference observations with EO-based products.However, applying such an approach is dependent on the availability of reference data at a disaggregated level.
In addition, even though CLMS products are probably the best available datasets to monitor soil sealing over Europe, the 2018 imperviousness degree layer was produced on a 10 m spatial resolution, taking advantage of the availability of Sentinel 2 data, whereas Land 2024, 13, 1087 3 of 20 earlier layers are provided on 20 m resolution.Unfortunately, the upgrade to 10 m from 20 m spatial resolution has resulted in a break in the IMD-based areal statistics; the 20 m resolution IMD time series (2006-2009-2012-2015) could be successfully harmonized and have shown a credible evolution in sealed cover.The IMD2018 dataset exhibits significantly more sealed structures than previous layers; thus, the amount of sealed cover is showing an unrealistic growth compared to former status.
Therefore, the aim of this paper is to use the time series and available sample reference data [11] for the development of a reliable spatially explicit time series dataset of sealed area in Europe.In this context, Europe is the study area represented by all EEA member and cooperating countries and the UK and further referred to in this paper as EEA38 + UK.The total area covered represents 5.85 million km 2 with the exception of the French overseas regions located in south and central America and the Indian Ocean.The objectives of this study are three-fold: (i) assess the level of bias from existing sealing datasets as compared to available reference data, (ii) make use of the available reference data to develop a suitable approach for harmonization including the provision of a freely accessible and harmonized time-series dataset and (iii) make recommendations for future developments of earth observation-based products.

Available Soil Sealing Geospatial Data Layers
The pan-European CLMS portfolio includes two kinds of image classifications containing detailed information about the extent and structure of artificially sealed areas: the High-Resolution Layer (HRL) Imperviousness [4] and the CLC+ Backbone raster [3] datasets.

CLMS Imperviousness Degree Layers
Imperviousness data are at the time of publication of this paper available for the reference years 2006, 2009, 2012, 2015 and 2018, and contain two types of products: status and change layers referred to as the HRL IMperviousness (IMD) and IMperviousness Classified Change (IMCC), respectively.The IMD layers capture the spatial distribution of artificially sealed areas, including the level of sealing of the soil per area unit.The level of sealed soil (imperviousness degree: 1-100%) is produced using an automatic algorithm based on the calibrated Normalized Difference Vegetation Index (NDVI) as described in [12,13].The percentage of sealed area is mapped for each status layer for the five reference years.The IMD status layers are available in 10 m spatial resolution (2018) and 20 m spatial resolution (2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015) and as aggregated 100 m datasets.Two types of IMCC products are available for each of the 3-year periods between the five reference years (2006-2009, 2009-2012, 2012-2015, 2015-2018): • A simple layer mapping the percentage of sealing increase or decrease for those pixels that show sealing change during the period covered.This product is available in 20 m and 100 m pixel sizes.• A classified change product that maps the most relevant categories of sealing change (no sealing, new cover, loss of cover, unchanged sealed, increased sealing, decreased sealing).This product is available in the 20 m pixel size only.

CLMS CLC+ Backbone Layer
The CORINE Land Cover (CLC) product has been the flagship of the Copernicus Land Monitoring Service for almost thirty years (1990-2018), providing users with harmonized land cover/land use data at the continental scale.However, CLC has a relatively coarse spatial resolution, and a lack of detail in certain land cover categories.In response, the European Environment Agency has developed a suite of new products and applications, known collectively as CLC+.Among these new products, the CLC+ Backbone (BB) 2018 dataset provides a wall-to-wall land cover map of Europe on 10 m resolution, for each pixel showing the dominant land cover among the 11 basic land cover classes.The dataset is available as a 10 m raster.The semantic content of class 1 (sealed areas) is by definition very similar to the semantic content of IMD data.The planned update cycle of CLC+ BB data is a minimum of 2 years after the foreseen 2021 update-comparable to IMD.
Figure 1 shows the comparison of the IMD layer for 2015 and 2018 and the 2018 CLC+ BB artificial layer converted to 100 m against the Bing aerial dataset.Some of the differences between the IMD2015 and 2018 layers are not due to changes but more likely related to omission in the lower-resolution 2015 layer.For example, much of the road network was omitted in 2015 and some of it appears in 2018; these roads were not built in those 3 years but the low-resolution 2015 layer does not capture these landscape elements.In addition, the CLC+ BB layer appears to provide even more spatial details in 2018 when compared to the IMD 2018 layer.
2018 dataset provides a wall-to-wall land cover map of Europe on 10 m resolution, for each pixel showing the dominant land cover among the 11 basic land cover classes.The dataset is available as a 10 m raster.The semantic content of class 1 (sealed areas) is by definition very similar to the semantic content of IMD data.The planned update cycle of CLC+ BB data is a minimum of 2 years after the foreseen 2021 update-comparable to IMD.
Figure 1 shows the comparison of the IMD layer for 2015 and 2018 and the 2018 CLC+ BB artificial layer converted to 100 m against the Bing aerial dataset.Some of the differences between the IMD2015 and 2018 layers are not due to changes but more likely related to omission in the lower-resolution 2015 layer.For example, much of the road network was omitted in 2015 and some of it appears in 2018; these roads were not built in those 3 years but the low-resolution 2015 layer does not capture these landscape elements.In addition, the CLC+ BB layer appears to provide even more spatial details in 2018 when compared to the IMD 2018 layer.

Sample Design and Stratification
An extensive validation exercise was performed as part of a dedicated CLMS project with more than 20,000 Primary Sample Units (PSUs, Figure 2) collected to assess the accuracy of imperviousness data layers.A two-stage sampling approach was implemented by selecting PSUs and Secondary Sampling Units (SSUs) within the PSUs.Two-stage sampling is considered suitable for accuracy assessment of land cover maps or area estimation [14] and can be adopted in certain cases because it represents a good compromise between the ease of data collection and a good geographic distribution.In our work, 1 ha square

Reference Data 2.2.1. Sample Design and Stratification
An extensive validation exercise was performed as part of a dedicated CLMS project with more than 20,000 Primary Sample Units (PSUs, Figure 2) collected to assess the accuracy of imperviousness data layers.A two-stage sampling approach was implemented by selecting PSUs and Secondary Sampling Units (SSUs) within the PSUs.Two-stage sampling is considered suitable for accuracy assessment of land cover maps or area estimation [14] and can be adopted in certain cases because it represents a good compromise between the ease of data collection and a good geographic distribution.In our work, 1 ha square PSUs were selected and a grid of 5 × 5 SSUs was applied to facilitate the data collection as shown in Figure 2.
Land 2024, 13, x FOR PEER REVIEW 5 of 20 PSUs were selected and a grid of 5 × 5 SSUs was applied to facilitate the data collection as shown in Figure 2. The approach for drawing PSU locations was based on the LUCAS 2 × 2 km grid representing a total of 1,100,000 points across Europe [15].The LUCAS point is then used to define the top left corner of the 1 ha PSU.However, considering the different character istics and class definition of the imperviousness layers, the thematic information of LU CAS points was not used directly with the exception of the LUCAS landscape photos which proved to be particularly useful for clarifying certain cases when they were availa ble.The main advantage of using a LUCAS-based approach is that a systematic approach ensures full traceability and it is also possible that sampling units will be shared for sev eral products, providing potential economies of scale.
Impervious areas remain relatively rare over Europe, covering less than 5% of the area (even though it has increased steadily over the years).Therefore, in case of the HRL images, a suitable stratification approach was required to ensure that sufficient reference observations are available.For both status and change layers, CORINE Land Cover artifi cial classes (CLC [16] and Open Street Map (OSM) road network) were used and con verted to a binary layer to identify potential omission errors.Relevant OSM road types were selected and rasterized to 100 m (for example, abandoned, construction, cycleway path, planned, trail, track…were removed) to obtain potential sealed area candidates.Us ing the relevant selection of OSM road types lead to a better spatialization of artificial and impervious areas.
CLC impervious classes were defined as follows based on CLC2018: The following stratification approach was adopted based on omission/commission strata for the Imperviousness Status and change layers and is defined as follows: • Stratum 1, commission status strata, where the imperviousness degree is between 1 and 100% in the 2015 layer.The approach for drawing PSU locations was based on the LUCAS 2 × 2 km grid representing a total of 1,100,000 points across Europe [15].The LUCAS point is then used to define the top left corner of the 1 ha PSU.However, considering the different characteristics and class definition of the imperviousness layers, the thematic information of LUCAS points was not used directly with the exception of the LUCAS landscape photos, which proved to be particularly useful for clarifying certain cases when they were available.The main advantage of using a LUCAS-based approach is that a systematic approach ensures full traceability and it is also possible that sampling units will be shared for several products, providing potential economies of scale.
Impervious areas remain relatively rare over Europe, covering less than 5% of the area (even though it has increased steadily over the years).Therefore, in case of the HRL images, a suitable stratification approach was required to ensure that sufficient reference observations are available.For both status and change layers, CORINE Land Cover artificial classes (CLC [16] and Open Street Map (OSM) road network) were used and converted to a binary layer to identify potential omission errors.Relevant OSM road types were selected and rasterized to 100 m (for example, abandoned, construction, cycleway, path, planned, trail, track. ..were removed) to obtain potential sealed area candidates.Using the relevant selection of OSM road types lead to a better spatialization of artificial and impervious areas.
CLC impervious classes were defined as follows based on CLC2018: The following stratification approach was adopted based on omission/commission strata for the Imperviousness Status and change layers and is defined as follows: • Stratum 1, commission status strata, where the imperviousness degree is between 1 and 100% in the 2015 layer.At first, the number of PSUs to allocate to each stratum was calculated as a function of the stratum area in a so-called proportional allocation.Then, for small strata, a minimum number of PSUs were allocated as suggested in [8].This was achieved by densifying the LUCAS grid to achieve the required number.Similarly for very large strata, there may be too many PSUs and the required number was achieved by subsampling the LUCAS grid.
In the second phase, selected PSUs were overlaid with countries or groups of countries with an area greater than 90,000 km 2 (see Figure 3 below) to ensure that a required minimum of 50 PSUs were available for each stratum.If not, additional PSUs were selected for those countries or group of countries.The purpose was to be able to assess any heterogeneity in the quality of the data across different regions.At first, the number of PSUs to allocate to each stratum was calculated as a function of the stratum area in a so-called proportional allocation.Then, for small strata, a minimum number of PSUs were allocated as suggested in [8].This was achieved by densifying the LUCAS grid to achieve the required number.Similarly for very large strata, there may be too many PSUs and the required number was achieved by subsampling the LUCAS grid.
In the second phase, selected PSUs were overlaid with countries or groups of countries with an area greater than 90,000 km 2 (see Figure 3 below) to ensure that a required minimum of 50 PSUs were available for each stratum.If not, additional PSUs were selected for those countries or group of countries.The purpose was to be able to assess any heterogeneity in the quality of the data across different regions.As explained in [17] and applied in [18], this stratified systematic sampling approach leads to unequal sampling intensities between different strata, which need to be accounted for by applying a weight factor ( ) to each sample unit based on the ratio between the number of sample units and the area of the stratum considered:  = . .
, where i corresponds to a given stratum, N is the total number of possible units (population) and n is the number of PSUs.Not applying this correction would result in underestimating or overestimating map accuracies.
There were several iterations performed for the validation of IMD products, starting with the IMD 2012 layer for which the initial methodology was developed as described in As explained in [17] and applied in [18], this stratified systematic sampling approach leads to unequal sampling intensities between different strata, which need to be accounted for by applying a weight factor (w i ) to each sample unit based on the ratio between the number of sample units and the area of the stratum considered: ŵi = n.N i n i .N , where i corresponds to a given stratum, N is the total number of possible units (population) and n is the number of PSUs.Not applying this correction would result in underestimating or overestimating map accuracies.
There were several iterations performed for the validation of IMD products, starting with the IMD 2012 layer for which the initial methodology was developed as described in [10].When the 2015 IMD status layer was produced, the 2006, 2009 and 2012 status layers were reprocessed and the whole time series was validated with an initial total number of 18,005 PSUs across EEA38 + UK.In addition, change layers were also validated with an additional set of PSUs, resulting in a total of 22,777 PSUs as described in [19].For the validation of the 2018 layers, the initial set of 18,005 PSUs were used but the interpretation was performed by a different team and for the validation of the 2015-2018 change layer, a subset of 10,005 PSUs was used with an overall total of 25,777 individual PSUs selected across all time periods as described in [20].However, it should be noted that not all PSUs were used for any given time period due to missing data or cloud cover.

Response Design
Response design is the methodology used to obtain the reference data for the sample units [21].Ref. [14] asserted that accuracy assessment is often made relative to some "higher quality determination of land cover".Ref. [22] indicated that visual interpretation is acceptable if the spatial resolution of EO data is sufficiently better compared to the thematic classification system.For this exercise, the visual interpretation of selected sample units will be used as a basis of the response design.The response design follows the definition of the products to be assessed both in terms of geometric and thematic characteristics.Therefore, the following characteristics are strictly followed and closely monitored as part of the visual interpretation: Ensure that the image data used are as close as possible temporally to that used for the map production.
Reference data can be collected with a double-blind protocol (without knowledge of the classes reported in the EO product (land cover map or classified image)) or following a plausibility approach.The plausibility approach considers that there can be errors in the reference data collected through visual interpretation as well as in the map itself.The plausibility approach reduces the amount of identified errors by comparing the sample with the map, thus adding another source of information for the photo-interpreter who then judges if it is reasonably coherent with the reality observed on finer spatial and/or temporal resolution images.
Statistical theory assumes that both EO and reference data are independent.Therefore a double-blind approach is preferable.However, double-blind data collection may produce major distortions in the presence of co-registration inaccuracy or conceptual uncertainty in the application of class labels.In the case of the imperviousness degree, conceptual uncertainty is low when a point has been perfectly located, but there may be a distance up to 10-20 m between the photo-interpreted reference point and the center of the corresponding pixel in a classified image.The apparent disagreement may be due to location inaccuracy rather than image classification errors.The use of 1 ha clusters as the PSUs (Figure 2) strongly reduces this anomaly.
Ref. [23] concluded that ignoring visual interpretation errors would result in underestimating the variance of the area estimate and developed an area estimator considering interpretation errors based on seven interpreters assessing the same sample.This would be difficult to implement over very large areas, but interpretation errors should be minimized as much as possible.Therefore, as a Quality Assurance and Control measure, at the start of the visual interpretation process, a certain number of samples (at least 10-20%) was interpreted twice and analyzed to identify the reasons for discrepancies.The percentage of samples interpreted twice decreases as the interpretation process progresses and as a result discrepancies decrease.
To ensure full traceability and transparency of the whole process, a double-blind approach was first applied, and a plausibility approach was applied at the end of the interpretation process on PSUs that disagreed with the map product.
In addition, some of the samples (around 10%) were interpreted by separate quality control interpretation teams as suggested in [23,24] and specific training sessions were Land 2024, 13, 1087 8 of 20 organized to confront discrepancies occurring between different interpretation teams to ensure that a consensus is reached.

Accuracy Assessment
Thematic accuracy is usually presented in the form of an error matrix containing the results of the interpretation.To be valid, a confusion or error matrix should be derived from a probability sample and additionally an unequal sampling intensity resulting from the stratified systematic sampling approach should be accounted for.This requires applying a weight factor (w i ) to each sample unit based on the ratio between the number of samples and the size of the stratum considered as described at the end of Section 2.2.1.
A number of accuracy metrics can be computed from the error or confusion metrics such as overall, producer (related to omission errors) and user (related to commission errors) accuracies.In this case, we are dealing with a single theme.
Previously, the CLMS imperviousness degree layer was validated by applying a 30% threshold, thus transforming the impervious degree layer into a binary layer [11].However, this does not consider the full range of values.Using continuous disagreement values requires an adaptation of the approach to compute confusion matrices [25,26].
Among the possible disagreement indicators, we propose to quantify the disagreement at the PSU level as the differences between the map value m i and the reference r i .The Mean Absolute Error (MAE), , where w i is the extrapolation weight (inverse of the sampling intensity), and Root Mean Square Error (RMSE), RMSE = ∑ i w i (m i −r i ) 2 ∑ i w i , are common metrics used in this context and are currently used to assess the accuracy of the Global Human Settlement Layer (GHSL, [27] also now produced as part of the Copernicus Global Land initiative).In addition, if the map value is larger (lower) than the reference value for a PSU, it will contribute to the commission (omission) error as illustrated in Figure 4.
In addition, some of the samples (around 10%) were interpreted by separate quality control interpretation teams as suggested in [23,24] and specific training sessions were organized to confront discrepancies occurring between different interpretation teams to ensure that a consensus is reached.

Accuracy Assessment
Thematic accuracy is usually presented in the form of an error matrix containing the results of the interpretation.To be valid, a confusion or error matrix should be derived from a probability sample and additionally an unequal sampling intensity resulting from the stratified systematic sampling approach should be accounted for.This requires applying a weight factor ( ) to each sample unit based on the ratio between the number of samples and the size of the stratum considered as described at the end of Section 2.2.1.
A number of accuracy metrics can be computed from the error or confusion metrics such as overall, producer (related to omission errors) and user (related to commission errors) accuracies.In this case, we are dealing with a single theme.
Previously, the CLMS imperviousness degree layer was validated by applying a 30% threshold, thus transforming the impervious degree layer into a binary layer [11].However, this does not consider the full range of values.Using continuous disagreement values requires an adaptation of the approach to compute confusion matrices [25,26].
Among the possible disagreement indicators, we propose to quantify the disagreement at the PSU level as the differences between the map value mi and the reference ri.The Mean Absolute Error (MAE),  = , are common metrics used in this context and are currently used to assess the accuracy of the Global Human Settlement Layer (GHSL, [27] also now produced as part of the Copernicus Global Land initiative).In addition, if the map value is larger (lower) than the reference value for a PSU, it will contribute to the commission (omission) error as illustrated in Figure 4. with   as the positive part function.The same approach can also be applied to RMSE to distinguish between omission and commission errors.This approach will allow us to assess commission and omission errors for the full range of the imperviousness degree layer values.For the MAE, the commission φ would be computed as φ = ∑ i w i pos(m i −r i ) ∑ i w i and omission ψ as ψ = ∑ i w i pos(r i −m i ) with pos(x) as the positive part function.The same approach can also be applied to RMSE to distinguish between omission and commission errors.This approach will allow us to assess commission and omission errors for the full range of the imperviousness degree layer values.
Using pos(m i − r i ) and pos(r i − m i ) as disagrerement indicators is similar to the con- cept of quantity disagreement proposed in [29] using the PSU as a measurement unit.

Biased and Unbiased Area Estimates
Biased area estimates are typically pixel counts where, in our case, area statistics are extracted from the HRL 100 m imperviousness degree layers and reported for the selected areas by taking the average imperviousness degree value of all pixels over a selected area.This value can then be multiplied by the total area in km 2 of the selected area to provide the biased area estimate.For the imperviousness change layers, the same procedure can be applied for pixels identified as gain or loss.It should also be noted that another important drawback of pixel counting is the absence of possibility of calculating the uncertainty of the resulting estimates.
Unbiased area estimates can be derived directly from the field data alone using the so-called direct expansion method [30], as long as the reference data have been collected based on a probabilistic sample and we assume that reference data are accurate.In the case of Simple Random Sampling (SRS) with equal probability, the estimate of proportion (y) of class (c) is given by y c = 1 n y i and its variance is var(y , where y i is the proportion of segment i (100 m pixels) covered by class c, N is the total number of segments in the region and n is the number of segments in the sample.The proportion of the study region sampled (n/N) is the sample fraction.The variance calculation above assumes a single-stage sampling; if a two-stage sampling is applied, this needs to be accounted for as described in [31].The estimate of class area (Z) in the study area (D) is as follows: Ẑc = D * y c ; and variance is var Ẑc = D 2 * var(y c ), where D is the area of the region.The direct expansion estimators are calculated for each stratum present in the AoI (area of interest) and the total estimate just corresponds to the weighted average of the proportions according to the area covered by each stratum.The standard error for the whole area is then as follows: σ Total = D 2 h .Var h , where D h is the stratum area.The 95% confidence interval (CI) is usually computed as +/− 1.96* σ Total ., although the level of confidence can be smaller if sample size in each stratum is not large enough [32].
As a starting point for the harmonization procedure, the direct estimator applied on the 2018 sample validation data (described in Section 2.2.1) provides a total estimated sealed area of 160,434 ± 7495 km 2 (95% CI) at the EEA38 + UK level as shown in Table 1.The total sealed area mapped by CLMS IMD2018 is close to 109,000 km 2 , while CLC+ BB reports 175,000 km 2 .Therefore, CLMS IMD2018 is underestimating sealed areas by approximately 51,000 km 2 (or 44,00 km 2 if we consider the 95% CI).Even larger differences are observed for the previous years.However, the difference between CLC+ BB and the direct estimate is much smaller with CLC+ BB slightly overestimating sealed areas by ca.15,000 km 2 (or 7000 km 2 with the 95%CI).It should be noted that two estimates are available for each year because years were assessed as pairs and not all sample units are always covered due to missing data or cloud cover from the IMD layer.However, both estimates are not statistically different with overlapping 95% confidence intervals (Table 1).For 2015, the differences are greater, because the IMD layer for 2015 was reprocessed following the 2018 production.In addition, the validation data for the 2015-2018 assessment were based on a smaller and different number of PSUs as compared to the previous years.
As explained in Section 2.2.1, there were different versions of the reference dataset for the status layer validation and the change layer validation.For consistency, the results for the status layer validation presented in Table 1 were based on the change validation dataset, which was deemed to be more accurate considering the greater number of sample units and the fact that it was produced after the reference dataset for the validation of the status layer.The results for gains presented in Table 2 are also based on the same dataset.
For 2006-2015, the change layers are either within the bounds of the 95% confidence interval of the CLMS validation data change area estimates or less than 20% above (Table 2).For 2015-2018, the differences are greater than 150%.This confirms the expectation that there is a discontinuity in the dataset with 2018 and even though 2006-2009 is also potentially problematic, this is far less critical by a factor close to 9 (268 km 2 vs. 2316 km 2 ).Therefore, our efforts concentrated on correcting the 2015-2018 IMCC data, and the change data from previous time periods especially for 2009-2012 and 2012-2015 was considered as acceptable.In addition, the focus was on correcting the gain data because losses can be considered negligible in comparison (there are very few areas where sealed surfaces a re-naturalized).

Harmonization Approach
The following subsections describe the approach that was adopted to produce what is referred to as the harmonized imperviousness change/time series.
Based on the results above, the CLC+ BB sealed class appears to be more accurate than the IMD2018 layer.Therefore, a first step to correct overall impervious areas for 2018 relied on the extraction of CLC+ BB artificial class 1 into a binary 10 m layer.
The harmonization of the 2015-2018 change layer is based on calculating the areas of gain in dedicated calibration units and a subsequent filtering out of omission errors in gains through a reclassification procedure of the 2015-2018 IMCC as detailed as follows.
Step 1-Calculating the areas of gain for each of the 91 combinations of production/bioregions: Based on experience, it is known that there is potentially substantial spatial variability in the quality of the IMCC layer.Therefore, an initial approach was to assess the level of change based on the CLMS validation data for each of the 91 combinations of production units (ca 300 × 300 km 2 ) and environmental zones [33], which we can refer to as calibration units (Figure 5).This represents a reasonable compromise between the level of spatial details and a sufficient number of PSUs available for each calibration unit (the total number is 95, but for the time being, the French overseas regions were excluded from the analysis).
This approach is similar to that of [34] for soybean mapping in the United States in which reference data were collected within 20 × 20 km blocks and the resulting soybean area estimates within each block were used to calibrate the soybean map.In our case, sealing areas and sealing change are much rarer phenomena to map.Therefore, our calibration units are larger with a smaller number of observations.Adjustments were estimated for all calibration units with two exceptions, (i) calibration units for which no gain was detected based on the validation data; and (ii) calibration units for which the level of gain from the map layer is less than for the validation data, and therefore would require us to add missing gain area, which is not possible with the adopted procedure.
as calibration units (Figure 5).This represents a reasonable compromise between the level of spatial details and a sufficient number of PSUs available for each calibration unit (the total number is 95, but for the time being, the French overseas regions were excluded from the analysis).This approach is similar to that of [34] for soybean mapping in the United States in which reference data were collected within 20 × 20 km blocks and the resulting soybean area estimates within each block were used to calibrate the soybean map.In our case, sealing areas and sealing change are much rarer phenomena to map.Therefore, our calibration units are larger with a smaller number of observations.Adjustments were estimated for all calibration units with two exceptions, (i) calibration units for which no gain was detected based on the validation data; and (ii) calibration units for which the level of gain from the map layer is less than for the validation data, and therefore would require us to add missing gain area, which is not possible with the adopted procedure.
Step 2-Discriminating function to separate actual gain from omission from previous periods.
In order to correct the large differences seen at the EEA38 + UK level and at the calibration units' level, some discriminating function needs to be applied to the gain data to separate real changes from other detected gain areas.There are several ways this could be achieved notably based on Machine/Deep Learning approaches, but this would require the input satellite imagery used in the production and considerable resources that were not available for this work.
Ref. [35] applied different thresholds locally to the density values to adjust the area covered by the Global Forest Change dataset from [36] and reduce the amount of bias, concluding that at the national level, a 70% threshold applied to the dataset was a closer Step 2-Discriminating function to separate actual gain from omission from previous periods.
In order to correct the large differences seen at the EEA38 + UK level and at the calibration units' level, some discriminating function needs to be applied to the gain data to separate real changes from other detected gain areas.There are several ways this could be achieved notably based on Machine/Deep Learning approaches, but this would require the input satellite imagery used in the production and considerable resources that were not available for this work.
Ref. [35] applied different thresholds locally to the density values to adjust the area covered by the Global Forest Change dataset from [36] and reduce the amount of bias, concluding that at the national level, a 70% threshold applied to the dataset was a closer depiction of the actual 30% threshold from the national forest definition, but for some areas, a more localized threshold adaptation was needed.Ref. [34] used the probability layer from an initial soybean classification in the USA to apply a separate threshold for each stratum to produce a binary layer that matched the soybean area estimates from collected field data.
As mentioned earlier, in our case, no reliable density or probability layers were available and a different approach was adopted based on the assumptions that most gains typically occur in the vicinity of existing sealed areas and that many of the detected omissions are often linked to the road network, which are further away from existing sealed objects.Therefore, an approach based on the distance of objects from the current vectorized change layer and the 100 m 2015 sealing status layer was applied as illustrated in Figure 6.The average Euclidian distance of each gain area polygon (red in Figure 6) from the IMCC layer for 2015-2018 to existing sealed area in 2015 was calculated and stored as an attribute.
are often linked to the road network, which are further away from existing sealed objects Therefore, an approach based on the distance of objects from the current vectorized change layer and the 100 m 2015 sealing status layer was applied as illustrated in Figure 6.The average Euclidian distance of each gain area polygon (red in Figure 6) from the IMCC layer for 2015-18 to existing sealed area in 2015 was calculated and stored as an attribute.A distance threshold was then applied interactively to gain objects for each calibration unit to obtain, as close as possible, the required adjustment.Gain objects within the distance threshold to existing sealed areas (in the 2015 layer) were classified as actual gain and objects outside were classified as omissions as shown in Figure 7.Most omissions appear to represent either road segments now classified using the higher spatial resolution of the Sentinel 2 input data or some large objects.In the enlarged extract shown, green objects are classified as omissions because the mean Euclidian distance of these objects is above the threshold set for these calibration units, whereas red objects are classified as change.
and objects outside were classified as omissions as shown in Figure 7.Most omissions appear to represent either road segments now classified using the higher spatial resolution of the Sentinel 2 input data or some large objects.In the enlarged extract shown, green objects are classified as omissions because the mean Euclidian distance of these objects is above the threshold set for these calibration units, whereas red objects are classified as change.The main output from this stage is a reclassified vector IMCC1518 sealing gain dataset in which gain polygons were either classified as omissions' or actual gains'.
Step 3-Production of revised sealed 100 m spatial resolution time series 2006-2018: The harmonized gain IMCC layers for 2015-18 and the binary sealed layer derived from the 2018 CLC+ BB raster layers are used to produce a new 2015 status layer at 10 m resolution and layers for previous years by applying the following procedure: 1.The reclassified vector IMCC1518 layer is rasterized as a 10 m layer, only keeping values classified as actual sealing' gains.2. This new sealing gain layer is combined with the losses and stable areas from the original IMCC1518 layer at 10 m to create a revised IMCC1518 layer.3. Gain areas are removed from and losses added to the binary sealed layer derived from the CLC+ BB sealed 2018 10 m layer to create a new revised 2015 status layer.4. Subsequent status layers for year n (e.g., year 2012) are produced from combining the status layer from the consecutive year (e.g., 2015) with the corresponding IMCC layer (e.g., IMCC 2012-2015) at 10 m resolution. 5.All 10 m layers are aggregated to 100 m spatial resolution, thus producing the harmonized dataset.
It should be stated that the 10 m layers produced from the procedure above should not be used directly because they result from a statistical calibration procedure, and the discrimination between omission and actual change even if correct at the production unit The main output from this stage is a reclassified vector IMCC1518 sealing gain dataset in which gain polygons were either classified as 'omissions' or 'actual gains'.
Step 3-Production of revised sealed 100 m spatial resolution time series 2006-2018: The harmonized gain IMCC layers for 2015-2018 and the binary sealed layer derived from the 2018 CLC+ BB raster layers are used to produce a new 2015 status layer at 10 m resolution and layers for previous years by applying the following procedure: 1.
The reclassified vector IMCC1518 layer is rasterized as a 10 m layer, only keeping values classified as 'actual sealing' gains.

2.
This new sealing gain layer is combined with the losses and stable areas from the original IMCC1518 layer at 10 m to create a revised IMCC1518 layer.

3.
Gain areas are removed from and losses added to the binary sealed layer derived from the CLC+ BB sealed 2018 10 m layer to create a new revised 2015 status layer.4.
Subsequent status layers for year n (e.g., year 2012) are produced from combining the status layer from the consecutive year (e.g., 2015) with the corresponding IMCC layer (e.g., IMCC 2012-2015) at 10 m resolution. 5.
All 10 m layers are aggregated to 100 m spatial resolution, thus producing the harmonized dataset.
It should be stated that the 10 m layers produced from the procedure above should not be used directly because they result from a statistical calibration procedure, and the discrimination between omission and actual change even if correct at the production unit level may not be spatially correct.In addition, the reference dataset PSUs used for the calibration procedure are at 100 × 100 m.Therefore, it is thought that the harmonized imperviousness change 100 m aggregated layers are expected to be more accurate.

Validation of Status and Change Layers
Both datasets, the CLMS IMD and the harmonized imperiousness change at 100 m resolution were validated based on the available CLMS validation data using the blind interpretation because the plausibility analysis was performed on the IMD and not on the harmonized imperiousness change data (see Table 3).From the table above, the harmonized imperiousness change dataset provides the most balanced results overall as compared with CLMS IMD with much more omission for CLMS IMD.This is particularly highlighted when PSUs classified as non-sealed in both the validation and map layers are excluded from the analysis (MAE > 0).This is because non-sealed areas dominate the landscape with more than 97% of the area identified as non-sealed in the reference data.For the validation of thematic classes covering a relatively small proportion of a study area such as impervious areas, the MAE > 0 is preferable for the validation of continuous thematic layers and clearly identifies the most accurate dataset.However, if the dataset is to be used to extract area statistics, a balance between omission and commission errors is necessary as exhibited by the harmonized impervious change dataset.
The conclusion was that the harmonized imperiousness change dataset provided a more realistic picture of the spatial-temporal distribution of sealed areas across Europe and was therefore used in subsequent steps.

Change Area Estimation
Regarding total sealed areas, the production of the 100 m harmonized imperviousness change dataset by applying the harmonization process as described in Section 3.2 appears to produce pixel count estimates that are much closer to the change validation dataset by almost a factor 9 as shown in Table 4.
However, the picture is slightly more contrasted when assessing area statistics for member states or a group of member states for smaller countries as illustrated in Figure 8.The harmonized imperiousness change 2018 dataset is still much closer to the reference data overall except for three countries/a group of countries (Benelux + Denmark, Czech Republic + Slovakia and Germany) for which the CLMS IMD2018 dataset is closer to the reference data.Otherwise, the harmonized imperiousness change 2018 dataset is always closer to the reference data and 12 out of 22 countries/country groupings are in fact within the 95% confidence intervals of the reference data.This was true for only 4 countries in case of the CLMS IMD2018 dataset.These results would suggest that even though the CLC+ artificial class (used as a basis for constructing the harmonized imperiousness change 2018 dataset) is more accurate overall than the IMD2018, this is not always the case locally and CLC+ appears to be substantially overestimating sealed areas in the three countries/groups of countries for which the IMD2018 dataset is closer.In all other cases, the IMD2018 is underestimating sealed areas as compared with the reference dataset.This considers that the level of change is still relatively small.With respect to change areas between 2015 and 2018 for which there was a major discontinuity in the IMD dataset as illustrated in Table 2, the harmonized imperiousness change dataset also appears to provide a better characterization of gain as illustrated in Table 5.The total gain area resulting from the adjustment procedure resulted in 1297 km 2 , close to the estimated 1273 km 2 estimated based on the validation data, as opposed to 3806 km 2 from the IMCC dataset.should now exhibit less omission errors.The 2006-2009 layer is slightly overestimating gains and would benefit from a similar approach to that of the initial 2015-2018 IMCC, but the difference is not as large (268 km 2 versus 2316 km 2 ).These results are broadly confirmed when disaggregated at the level of countries/groups of countries as illustrated in Figure 9 with a few exceptions.With respect to change areas between 2015 and 2018 for which there was a major discontinuity in the IMD dataset as illustrated in Table 2, the harmonized imperiousness change dataset also appears to provide a better characterization of gain as illustrated in Table 5.The total gain area resulting from the adjustment procedure resulted in 1297 km 2 , close to the estimated 1273 km 2 estimated based on the validation data, as opposed to 3806 km 2 from the IMCC dataset.Overall, the harmonized imperiousness change time series is much closer to the reference dataset with exceptions in Türkiye, Bulgaria, Austria, Switzerland and Liechtenstein and Iceland for which the CLMS IMD dataset is closer.In all other cases, the harmonized imperiousness change dataset is much closer to the reference data with 16 out of 22 countries/groups of countries within the 95% confidence interval range against only 8 out of 22 for the CLMS IMD dataset.This should also consider that the confidence intervals are quite large due to the relatively small number of samples classified as change for most countries/groups of countries.Nevertheless, this may suggest that the discriminating function applied to correct the 2015-2018 IMCC layer based on distance does not always appear to be sufficiently accurate and alternative approaches based on using confidence layers (which were not at our disposal) or a more sophisticated procedure based on Machine Learning/Deep Learning should further improve the quality of the results.
would benefit from a similar approach to that of the initial 2015-18 IMCC, but the difference is not as large (268 km 2 versus 2316 km²).These results are broadly confirmed when disaggregated at the level of countries / groups of countries as illustrated in Figure 9 with a few exceptions.Overall, the harmonized imperiousness change time series is much closer to the reference dataset with exceptions in Türkiye, Bulgaria, Austria, Switzerland and Liechtenstein and Iceland for which the CLMS IMD dataset is closer.In all other cases, the harmonized imperiousness change dataset is much closer to the reference data with 16 out of 22 countries / groups of countries within the 95% confidence interval range against only 8 out of 22 for the CLMS IMD dataset.This should also consider that the confidence intervals are quite large due to the relatively small number of samples classified as change for most countries / groups of countries.Nevertheless, this may suggest that the discriminating function applied to correct the 2015-18 IMCC layer based on distance does not always appear to be sufficiently accurate and alternative approaches based on using confidence layers (which were not at our disposal) or a more sophisticated procedure based on Machine Learning/Deep Learning should further improve the quality of the results.

Conclusions and Recommendations
The development and application of a harmonization methodology based on the combination of CLMS datasets with available reference sample observations were applied to the IMD dataset for the purpose of developing robust environmental indicators.The resulting datasets provide a much more temporally and spatially consistent time series than the original CLMS dataset: • The level of omission errors is reduced for all status layers.
• The level of commission errors is slightly increased but remains low and at a similar level to the level of omission errors, meaning that area statistics as extracted from the datasets should be close to reality, which was not the case previously.• The discontinuity in the time series between 2015 and 2018 is now resolved with changes in line with the expected level of increase over EEA38 + UK.
The use of CLC+ BB to improve the CLMS IMD status layers was a major improvement, but there are still some areas/countries for which the CLMS IMD areas appear closer to the reference data.Therefore, some more sophisticated data fusion approaches should be applied combining the best of the two datasets.The approach developed in this study is applicable to other datasets whenever spatio-temporal consistency is required and can be applied as long as suitable reference data (i.e., observations obtained from a probabilistic sample) are available to calibrate the changes.
Even if the area statistics from the harmonized change layer are more reliable, there is a substantial amount of error for both omission and commissions and the future production of more accurate 10 m change layers should be prioritized as part of the CLMS production.More sophisticated approaches based on confidence layers, Machine Learning/Deep Learning methods, image-to-image change detection methods and/or manual enhancements if required should be applied.Status layers should be produced from the combination of the previous status layer with the new change detection and not the other way around, meaning that the change layer should be produced first and detailed product specifications should be defined that ensure that the product is accurate but also that there is a balance between omission and commission errors.In addition, the historical dataset should be reproduced every time that a new status layer becomes available as long as its quality is improved.The historical layers can easily be reprocessed based on the accounting approach starting from the most recent layer and integrating the change layers for each time period as done in this study.
In addition, a reliable reference dataset should be produced and made available to provide a basis for calibrating the production of the change and status layer.The LUCAS dataset has the potential to provide this, but the specifications of LUCAS should be more closely related to those of the CLMS IMD layer in terms of the thematic definition as well as spatial support.Finally, the issue of calibrating EO-based layers at a more disaggregated level such as NUTS (Nomenclature des Unités Territoriales Statistiques/Nomenclature of Territorial Units for Statistics) level 3 for Europe is necessary to produce relevant statistical indicators and the development of small-area estimators as described in [37] should be considered in the production of a suitable reference dataset.

Figure 2 .
Figure 2.An example of a PSU (area of the image) and SSUs (red dots) within.SSUs are organized in a 20 m grid (25 sampling points) to assess impervious surfaces implemented by the CLMS service providers in the validation of the IMD layers.

Figure 2 .
Figure 2.An example of a PSU (area of the image) and SSUs (red dots) within.SSUs are organized in a 20 m grid (25 sampling points) to assess impervious surfaces implemented by the CLMS service providers in the validation of the IMD layers.

Figure 3 .
Figure 3.The level of reporting by country or aggregated countries (smaller countries were aggregated to ensure a minimum area of 90,000 km²) as applied in the Copernicus Land Monitoring Service external validation contract.

Figure 3 .
Figure 3.The level of reporting by country or aggregated countries (smaller countries were aggregated to ensure a minimum area of 90,000 km 2 ) as applied in the Copernicus Land Monitoring Service external validation contract.
wi is the extrapolation weight (inverse of the sampling intensity), and Root Mean Square Error (RMSE),  = ∑ ∑

Figure 5 .
Figure 5.The description of calibration units based on the combination of 300 by 300 km production units and environmental zones [33].

Figure 5 .
Figure 5.The description of calibration units based on the combination of 300 by 300 km production units and environmental zones [33].

Figure 6 .
Figure 6.Top: distance layer from 2015 imperviousness degree status layer (gray tones, with darke areas representing smallest distance) and candidate gain areas in red from 2015-18 imperviousnes classified change layer.Bottom: a corresponding is Sentinel 2 image.

Figure 6 .
Figure 6.(Top): distance layer from 2015 imperviousness degree status layer (gray tones, with darker areas representing smallest distance) and candidate gain areas in red from 2015-2018 imperviousness classified change layer.(Bottom): a corresponding is Sentinel 2 image.

Figure 7 .
Figure 7. Existing sealed areas are shown in blue whilst potential gains are classified as actual gain in red and omission in green.

Figure 7 .
Figure 7. Existing sealed areas are shown in blue whilst potential gains are classified as actual gain in red and omission in green.

Land 2024 , 20 Figure 8 .
Figure 8. Percentages of impervious areas for large countries and groups of countries from EEA38 + UK for the validation reference, imperviousness degree and harmonized imperiousness change datasets for 2018 with the 95% confidence interval for the reference data.

Figure 8 .
Figure 8. Percentages of impervious areas for large countries and groups of countries from EEA38 + UK for the validation reference, imperviousness degree and harmonized imperiousness change datasets for 2018 with the 95% confidence interval for the reference data.

Figure 9 .
Figure 9.Total gain in impervious areas for countries / group of countries for 2015-2018 from EEA38 + UK for validation reference, CLMS IMD and harmonized imperiousness change datasets for 2018 with 95% confidence interval for reference data.

Figure 9 .
Figure 9.Total gain in impervious areas for countries/group of countries for 2015-2018 from EEA38 + UK for validation reference, CLMS IMD and harmonized imperiousness change datasets for 2018 with 95% confidence interval for reference data.

•
Stratum 2, High-Probability Omission strata, where the imperviousness degree is 0% in the 2015 reference year but where the Open Street Map and the traditional CLC 2015 layers indicate "impervious classes".• Stratum 3, Low-Probability Omission strata covering the rest of the area.• Stratum 4, commission Change Strata with all changes between 2006-2009, 2009-2012, 2012-2015 and 2015-2018 [gain, loss, increased and decreased were combined due to the very small area covered].

•
Stratum 2, High-Probability Omission strata, where the imperviousness degree is 0% in the 2015 reference year but where the Open Street Map and the traditional CLC 2015 layers indicate "impervious classes".• Stratum 3, Low-Probability Omission strata covering the rest of the area.• Stratum 4, commission Change Strata with all changes between 2006-2009, 2009-2012, 2012-2015 and 2015-2018 [gain, loss, increased and decreased were combined due to the very small area covered].

Table 1 .
Assessment of difference in area (km 2 ) between available status layers and CLMS validation data for the EEA countries + UK.

Table 2 .
Assessment of differences in area (km 2 ) between the IMCC dataset and CLMS Validation data for gains for the EEA38 countries and the UK (Red and amber, respectively, indicate large and smaller difference with upper limit of 95% CI).

Table 3 .
Comparison of validation results for blind interpretation CLMS IMD layers and harmonized imperiousness change datasets (Red and Green, respectively, indicate the worst and best results whilst amber indicates intermediate result or a result relatively close to the best one).

Table 4 .
Assessment of differences in area (km 2 ) between CLMS IMD and harmonized imperiousness change status layers and CLMS Validation data for the EEA38 countries and the UK (Red and Green, respectively, indicate the worst and best results).

Table 5 .
Comparison between imperviousness gain area estimates from CLMS validation data, initial CLC+ BB/IMCC layers and revised harmonized CLC+ BB/IMCC layer.(Red and amber, respectively, indicate large and smaller difference with upper limit of 95% CI).The gain areas are now in line with the validation data for 2015-2018.It was already the case for 2009-2012 and 2012-2015, but the status layers for 2015, 2012, 2009 and 2006

Table 5 .
Comparison between imperviousness gain area estimates from CLMS validation data, initial CLC+ BB/IMCC layers and revised harmonized CLC+ BB/IMCC layer.(Red and amber, respectively, indicate large and smaller difference with upper limit of 95% CI).The gain areas are now in line with the validation data for 2015-18.It was already the case for 2009-2012 and 2012-15, but the status layers for 2015, 2012, 2009 and 2006 should now exhibit less omission errors.The 2006-2009 layer is slightly overestimating gains and