Spatial and semantic e ﬀ ects of LUCAS samples on fully automated land use/ land cover classi ﬁ cation in high-resolution Sentinel-2 data

In this study, we test the use of Land Use and Coverage Area frame Survey (LUCAS) in-situ reference data for classifying high-resolution Sentinel-2 imagery at a large scale. We compare several pre-processing schemes ( PS ) for LUCAS data and propose a new PS for a fully automated classi ﬁ cation of satellite imagery on the national level. The image data utilizes a high-dimensional Sentinel-2-based image feature space. Key elements of LUCAS data pre-processing include two positioning approaches and three semantic selection approaches. The latter approaches di ﬀ er in the applied quality measures for identifying valid reference points and by the number of LU/ LC classes (7 – 12). In an iterative training process, the impact of the chosen PS on a Random Forest image classi ﬁ er is evaluated. The results are compared to LUCAS reference points that are not pre-processed, which act as a benchmark, and the classi ﬁ cation quality is evaluated by independent sets of validation points. The clas-si ﬁ cation results show that the positional correction of LUCAS points has an especially positive e ﬀ ect on the overall classi ﬁ cation accuracy. On average, this improves the accuracy by 3.7%. This improvement is lowest for the most rigid sample selection approach, PS 2 , and highest for the benchmark data set, PS 0 . The highest overall accuracy is 93.1% which is achieved by using the newly developed PS 3 ; all PS achieve overall accuracies of 80% and higher on average. While the di ﬀ erence in overall accuracy between the PS is likely to be in ﬂ uenced by the respective number of LU/LC classes, we conclude that, overall, LUCAS in-situ data is a suitable source for reference information for large scale high resolution LC mapping using Sentinel-2 imagery. Existing sample selection approaches developed for Landsat imagery can be transferred to Sentinel-2 imagery, achieving comparable semantic accuracies while increasing the spatial resolution. The resulting LC classi ﬁ cation product that uses the newly developed PS is available for Germany via DOI: https://doi.org/10.15489/1ccmlap3mn39.


Introduction
Satellite-based mapping of land use/land cover (LU/LC) provides objective, up-to-date information on both the current state of and changes occurring on the Earth's surface. Recent advances in satellite technology, image classification techniques, and processing infrastructure allow for large-area data analyses, e.g., on a regional (Leinenkugel et al., 2019), national (Mack et al., 2017), continental (Pflugmacher et al., 2019), or even global level (Chen et al., 2014). In multi-temporal analyses, land cover change is monitored to detect land surface dynamics such as alterations in the global forest stand (Hansen et al., 2013) and urban growth (Taubenböck et al., 2012). While low to medium-resolution satellite imagery (MODIS, Landsat) was deployed for large-scale analyses in the past, recent studies have also utilized Sentinel-2 data at a geometric resolution of 10 m × 10 m for LU/LC mapping (e.g., Close et al., 2018). With its high temporal and spatial coverage and high spatial resolution, the freely accessible European Sentinel-2 satellite images are viable contenders for LU/LC mapping (Sánchez-Espinosa and Schröder, 2019).
One state-of-the-art method for LU/LC mapping utilizes modern machine learning techniques. The basic concept of image classification can be broken down into two steps: first, learning the classification model from labeled reference data and second, its prediction to all pixels of the satellite imagery. The quality of the classified image depends on the employed machine learning algorithm (Maxwell et al., 2018). Moreover, the accuracy is significantly influenced by the quality, quantity, spatial and semantic distribution, and positional correctness of the reference data. The latter has played a minor role for low to medium-resolution satellite images; however, it is considered to be a significant source of classification errors for imagery with increased resolutions. As an example, for an image with a geometric resolution of 10 m, a positional error of ∼5 m for the ground truth reference point may lead to incorrect learning by the model and thus to poorer results in thematic map accuracy. For this reason, high-quality reference data are crucial for the creation of reliable LU/LC maps.
In recent years, vast LU/LC reference databases have been created. For example, the Geo-Wiki project (Fritz et al., 2017, https://geo-wiki. org) a global LU/LC reference data set, provides ∼150,000 reference points gathered using crowd-sourcing and visual image interpretation. The number of samples available in some regions, however, is relatively low despite the enormous effort undertaken to provide a consistent reference data set of LU/LC information, e.g., as of December 2019, there are reference points for 374 distinct locations in Germany. Other approaches use volunteered geographic information (VGI) from the OpenStreetMap-project (OSM, https://openstreetmap.org) as reference data for remote sensing image classification (Schultz et al., 2017;Wan et al., 2017;Maggiori et al., 2017). The data, however, vary in spatial and semantic consistency. One alternative to these reference data sets for Europe is the Land Use and Coverage Area frame Survey (LUCAS). This pan-European survey, coordinated by the statistical office of the European Union (EU) (EUROSTAT), offers highly detailed in-situ LU/LC information of 270,000 ground truth points across all 28 EU member states. In contrast to the aforementioned crowd-sourced approaches, trained experts also collect information in the field, following strict guidelines to ensure high levels of consistency. At every point, LU/LC information and environmental characteristics, and information on spatial accuracy are collected. LU/LC are registered using a hierarchical classification scheme (EUROSTAT, 2015b), which differentiates between the following major land cover classes: artificial land (A), cropland (B), woodland (C), shrub land (D), grassland (E), bare soil, moss, and lichens (F), water (G), and wetlands (H). At a higher thematic level, the LUCAS database comprises 84 sub-classes.
In recent studies, this database has been utilized as training and validation data in combination with remote sensing data. Some related studies focused on vegetative land cover to derive green cover maps (Zillmann et al., 2014;Tassopoulou et al., 2019), agricultural inventories (Conrad et al., 2010;Esch et al., 2014;Kussul et al., 2018) or vegetation monitoring (Khaliq et al., 2018). Others used the broad spectrum of land cover types for more semantically holistic approaches (Mack et al., 2017;Close et al., 2018;Pflugmacher et al., 2019;Leinenkugel et al., 2019). Most commonly, especially for large-scale approaches, Landsat imagery at a resolution of 30 m pixels is utilized for such applications. In contrast, LUCAS combined with Sentinel-2, with a 10 m geometric resolution, has only been used for local or regional scales (Khaliq et al., 2018;Close et al., 2018). A large-scale national LU/LC classification using Sentinel-2 is therefore still lacking.
Compared to Landsat, Sentinel-2 offers a spatial resolution that is nine times higher for multi-spectral images. Therefore, positional errors in the ground truth data can significantly decrease the classification quality. During the LUCAS survey, the position from which a sample point is surveyed is captured via GPS devices. This method is prone to positional GPS errors, which are a source of uncertainty discussed in previous remote sensing applications research (e.g. Pflugmacher et al., 2019). Moreover, the LU/LC information recorded for each LUCAS point always refers to the originally intended location, which is also referred to as the theoretical location (EUROSTAT, 2015b). This theoretical point cannot always be reached at the time of the survey as it may be located on a water body or private property (EUROSTAT, 2015a). This can lead to a spatial offset between the GPS point and the observed theoretical location. When the GPS position is used to extract spectral information for a subsequent machine learning process, it can lead to a decrease in classification accuracy.
In addition to the position of the LUCAS points, the semantic and qualitative information collected for each LUCAS point can be used to select suitable reference data. In related studies, we identified different approaches of handling this selection process. These approaches vary in terms of semantic detail and quality criteria which are applied to select valid samples. While Pflugmacher et al. (2019), for example, use a total of 12 LU/LC classes, Close et al. (2018) only use five target classes. These differences in the number of distinct classes can impact the performance of the classification model (Ma et al., 2017), and yet, a joint evaluation of these approaches of pre-processing LUCAS input data for their use in remote sensing classification is still missing.
This study therefore aims to provide a comprehensive evaluation of the spatial and semantic effects of LUCAS samples on large-scale LU/LC classifications. We apply three approaches for selecting LUCAS sample data for their use in large-scale high-resolution land cover mapping, which is hereafter called pre-processing schemes (PS). Two recently proposed extensive remote sensing image classification approaches based on LUCAS in-situ samples by Mack et al. (2017) and Pflugmacher et al. (2019) are reviewed; they were originally designed for national or international LU/LC classification using Landsat imagery. We transfer these approaches to 10 m spatial resolution to test their applicability to Sentinel-2 imagery. Furthermore, we propose a third PS for high-resolution LU/LC analyses. This new PS is designed to contain seven land cover classes: artificial land, four vegetative land cover types that differ in height and seasonality of growth, bare land, and water. Because positioning is crucial in applications with high-resolution imagery, one focus of this study is on the positioning of the LUCAS samples. Eventually, one large-area land cover map is created using the newly proposed PS.
In Section 2, we introduce the data and methods used in this study and the experimental setup. Section 3 provides a detailed description of the results. The findings are discussed in Section 4 and summarized in Section 5.

Material and methods
In this study, we compare different PSs for Sentinel-2 image classification. Therefore, we review existing approaches and their specific methods for selecting LUCAS in-situ samples for classification. To test their spatial and semantic effects we apply the following methodological workflow (see Fig. 1). First, the nationwide LUCAS reference data and Sentinel-2 imagery are collected and pre-processed. Second, in an iterative modeling process, we assess the impact of different spatial and semantic PSs on the classification accuracy. Third, one model is deployed for large-scale classification using the input image data. The individual steps and parameters are described in detail in the following subsections.

LUCAS reference data
The pan-European LUCAS is a comprehensive acquisition framework of in-situ information launched by the EU in 2001. Every three years observers collect detailed in-field information that is systematically distributed in a 2 km × 2 km grid throughout Europe. This spatially stratified distribution reduces the risk of spatial auto-correlation, which is an important property of reference data (Stehman, 2009;Aune-Lundberg and Strand, 2014;Geiß et al., 2017). In the field, photographs of the individual points are taken, and ecological parameters are documented (Orgiazzi et al., 2018). The collected in-situ information is published and is accompanied by detailed metadata concerning the quality of the observation itself, such as the acquisition date or GPS location of the observer in the field.

LUCAS positioning
By design, each LUCAS sample point is located at the intersection of a regular 2 km × 2 km INSPIRE grid (https://inspire.ec.europa.eu/) where the LU/LC information is surveyed. In the 2015 survey, however, the observers were unable to reach 88.3% of the sample points at their exact location because they were located on private properties, in inaccessible (wet-)lands, or in dense urban built-up structures. These M. Weigand, et al. Int J Appl Earth Obs Geoinformation 88 (2020) 102065 points can generally be observed from a distance. In such cases, the GPS location from which the theoretical point is observed and the distance to it is recorded (Karydas et al., 2015). This means that the recorded GPS position, at which the observer collects the information of the sample point, does not necessarily spatially correspond to the theoretical point location. For example, a LUCAS point located in the sea can be assessed from the shore. Up until the 2015 LUCAS survey, the data only included these GPS locations as the geographic coordinates. That is, the observation distance was used as a quality criterion for selecting LUCAS samples (e.g. Pflugmacher et al., 2019), which suggests that the GPS locations were used for locating the LUCAS samples in geographic space. However, especially for high-resolution imagery, very small spatial inaccuracies can introduce errors to the LC classification process. For example, a sample point located on water that is assessed from the shore would, as a consequence of the shifted position, provide an incorrect spectral signature for any subsequent machine learning processes. In contrast, applying strict rules for excluding samples that are evaluated from a distance leads to the loss of many sample points. The theoretical location of the sample points (for readability hereafter referred to as GRID), that is the location for which the LC information is recorded, is encoded inside the point ID. It can either be reconstructed, e.g., LUCAS-ID = 12345678, northing = 1.234.000, and easting = 5.678.000 with EPSG:3035, or it can be acquired separately. For the 2015 survey, the average distance between the GRID and GPS locations is 24.22 m (median = 5 m, 1st quartile = 2 m, and 3rd quartile = 16 m). This issue has not yet been addressed in the literature. In this study we therefore systematically assess the error introduced to a classification result when using the GPS coordinate pair.

LUCAS sample pre-processing
In addition to positional information, the main information provided by the LUCAS data base are the LU/LC samples. The physical appearance of the land cover in remote sensing imagery is used in classification and therefore its physical representation decisive for the classifier. Land use types, in contrast, are semantic and therefore implicitly used in this data approach. Hence, it is common in related studies to reclassify the original LUCAS LU/LC hierarchy. One example of this is tree plantation land cover, such as cherry tree cultures. In the original LUCAS hierarchy, these are considered to be cropland (B), although their physical appearance (e.g., vegetative height) differs from most other low vegetation agricultural land covers included in class B.
In an image classification workflow, this inevitably introduces class heterogeneity which then leads to lower classification performance. Furthermore, LUCAS samples contain a multitude of qualitative and quantitative metadata. Existing studies using LUCAS as reference data utilize these metrics to select suitable points of high quality for training and validating a remote sensing image classification (e.g., Conrad et al., 2010;Mack et al., 2017;Pflugmacher et al., 2019). Thereby, each PS is motivated by the land cover classes under investigation and satellite spatial resolutions.
To evaluate the effects of reclassifying and select the LUCAS samples, we introduce four different schemes: PS 0 as a benchmark, PS 1 and PS 2 which are adapted from recent studies, and PS 3 as a newly developed PS for LUCAS data. Table 1 summarizes the number of samples after pre-processing.
PS 0 : As a reference pre-processing scheme, we do not filter any LUCAS points based on attribute information but merely reduce the LC classes to the original eight main classes (A-H). Invalid GPS points are excluded from the sample.
PS 1 : The scheme developed by Mack et al. (2017) preserves all seven non-woodland land cover classes of the original LUCAS hierarchy and distinguishes the two woodland sub-classes: broadleaved woodlands (C1) and coniferous woodlands (C2). Consequently, there are nine land cover classes in total. Furthermore, striving for pure Landsat pixels, the authors exclude samples located within a homogeneous patch smaller than 1 ha if they are not classified as artificial land (A). This exception was important for the classification process, as almost no built-up samples would have remained otherwise (Mack et al., 2017). Manual spatial corrections, as performed in the original work by Mack et al. (2017), were not performed in this study.
PS 2 : For a pan-European application, Pflugmacher et al. (2019) differentiate both woodlands and croplands based on their seasonality. This sampling scheme also differentiates mixed woodlands (C3) and separates permanent snow/glaciers (G50) from other water areas, creating 12 land cover classes in total. Furthermore, the authors utilize detailed LUCAS attributes to exclude specific classes such as non-built-up linear features (A22), which are not detectable with coarse resolution imagery, and temporary grasslands (B55). With respect to the spatial resolution of Landsat-8, several qualitative filters are applied. They include a minimum patch size of 0.5 ha, a land cover proportion larger than 50%, and, most importantly, a maximum observation distance of 30 m. For both artificial land and perennial cropland, more relaxed filtering criteria were chosen by the authors. For more details, refer to Pflugmacher et al. (2019).
PS 3 : Building on previous approaches, we propose the following new LUCAS PS for high-resolution land cover mapping using Sentinel-2. With respect to the high spatial resolution of Sentinel-2, all samples that are not artificial land must be located in a homogeneous patch larger than 0.5 ha. Furthermore, only regular "on the point" observations were kept, excluding samples which are located on a boundary between two LU/LC types, whether on an edge or small linear feature (cf. EUROS-TAT, 2015a, p. 18). As several different land cover types can co-exist, e.g., agroforestry (cf. EUROSTAT, 2015a, p. 26), all samples with multiple land cover registrations or samples that are located within small vegetative patches less than 20 m × 20 m in size (cf. EUROSTAT, 2015a, p. 25) are excluded. Water areas and wetlands that are temporarily dry/flooded at the time of the visit (e.g., a riverbed cf. EUR-OSTAT, 2015a, p. 99) are also excluded, as they represent very temporally heterogeneous land cover. This PS discriminates between artificial land and natural surfaces. The latter include vegetative surfaces, bare soil, and water. Due to their environmental and micro-climatic implications, vegetative surfaces are differentiated into four categories by height and seasonality of growth: high perennial vegetation subsumes coniferous tree covers, whereas high seasonal vegetation consists of deciduous trees. Similarly, low seasonal vegetation cover is characterized by strong seasonal variations throughout the year, such as croplands. Low perennial vegetation is more persistent, e.g., meadows or shrubs. We therefore introduce a new nomenclature of classes. This new class hierarchy is summarized in Table 2. While some classes are separable on the ground for a human observer, we anticipate no or poor separability from space (Conrad et al., 2010), such as a narrow road overgrown with trees. Therefore, these classes were excluded from this remote sensing classification workflow (cf. Table 2). From the original eight LUCAS classes, seven new LC classes were formed during preprocessing which are proposed here for remote sensing image classification with Sentinel-2.

Image data
To describe the spectral, temporal, and spatial variability of the LU/ LC types, we compose a high dimensional feature space combining Earth observation with auxiliary geo-information.
With their 10 m × 10 m spatial resolution and 13 spectral bands, the European Sentinel-2 satellites provide data while being free of charge. This makes them a valuable data source for large-scale applications. Consequently, we use all Sentinel-2 (Level 1C) scenes with a cloud cover below 60% which were acquired over Germany. Additional cloud masking is performed, utilizing the QA60 band provided in the Sentinel-2 imagery. We only use data from June 2015 to April 2017 (N = 389 scenes). From these we conflate three kinds of image data (cf. Table 3): a) a composite of median mosaics of the spectral bands 2, 3, 4, and 8 with 10 m spatial resolution; (b) the Normalized Difference Vegetation Index (NDVI, Rouse et al., 1974), Normalized Difference Water Index (NDWI, McFeeters, 1996), and Normalized Difference Built-Up Index (NDBI, Zha et al., 2003), which have already been successfully applied in related LU/LC classifications (Griffiths et al., 2013;Pelletier et al., 2016;Leinenkugel et al., 2019;Tassopoulou et al., 2019) and are calculated for all scenes and reduced to the 25th, 50th, and 75th Table 1 The number of LUCAS samples in Germany for the PS tested in this study. The positioning approach is indicated by GRID and GPS; the differences are due to invalid GPS points and the observation distances exceeding a certain value.  percentiles, resulting in three mosaics for each index; and (c) auxiliary imperviousness information are included from external sources. While the input imagery for creating the spectral median mosaics (a) are limited by their acquisition date during the observation period of LUCAS (May through September), the index percentiles (b) are derived from the imagery throughout all seasons. Thus, the latter allows for the depiction of seasonality of vegetated areas throughout the year and provides useful information for discriminating vegetative LC types (Griffiths et al., 2014(Griffiths et al., , 2019Mack et al., 2017).
To increase the predictive power of the machine learning algorithm, additional features are derived from these initial image layers. It has been shown that textural features significantly increase land cover classification accuracy (Khatami et al., 2016;Li et al., 2014). We argue that with a spatial resolution of 10 m × 10 m, many of the LC classes such as urban environments are defined by their adjacent neighborhood of pixels, especially since several LUCAS samples describe the local land cover within a circular range of up to 20 m (Karydas et al., 2015). We therefore derive first-order focal textures (median, mean, and standard deviation) for three kernels (window sizes of 3 pixels, 5 pixels, and 9 pixels) for all 13 image layers. Furthermore, second-order GLCM textures (Haralick et al., 1973) are calculated for the four spectral mosaics. Overall, a total of 226 image layers are derived from the Sentinel-2 imagery.
Additional information, especially for differentiating highly heterogeneous LC patches of artificial land, is provided through (c) nationwide imperviousness layers that are generated for rails and roads using data from the OpenStreetMap project and building footprints from level-of-detail 1 (LoD-1) building models provided by the German Federal Agency for Cartography and Geodesy. Each 10 m × 10 m pixel is assigned to the relative mutual overlap with each of the respective polygon layers. Including the imperviousness images, a total of 229 layers are used as a database for image classification (see Table 3).

Modeling
Preparing for classification and evaluation, the LUCAS samples are split into independent, stratified sets of samples by a ratio of 80% for training and 20% for validation. To classify the image, a random forest classifier (Breiman, 2001) is applied since it has previously proven to provide sufficient results even in highly dimensional feature spaces (Geiß et al., 2015;Khatami et al., 2016;Pelletier et al., 2016;Wurm et al., 2017).
Since both sample splitting (training versus validation) and the generation of random forests rely on random samples, the implications of the random selection cannot be ruled out. Taking this into account, 100 random models are created for each experiment.
The performance of all models is evaluated against the overall accuracy (OA) (Congalton, 1991). The variation in OA values across the 100 iterations per experiment is summarized using the standard deviation as a measure of robustness.

Image classification
Out of the 800 conducted experiments, the highest performing model is selected based on OA. This model is eventually deployed for classifying the image data, resulting in a large-scale land cover classification product for Germany. As the final step, the results of the map are evaluated using marginal proportional map accuracy estimation (Card, 1982;Olofsson et al., 2014;Stehman and Foody, 2019). The error matrix of the stratified random samples is therefore weighted by the occurrence of each the class in the final map. This weighted error matrix is then used to calculate estimates of OA (Ô ), user accuracy (Û ), and producer accuracy (P ), and their respective standard errors.

Overall evaluation
The results of the overall model performance provide insights into the effects of the different pre-processing schemes (PS 0 -PS 3 ) applied to the LUCAS samples and sample positioning of points (GPS and GRID). All results are summarized in Fig. 2. The bar plot represents the average OA for the experiment grouped by PS and positioning approach. The error margins introduced by the set of random numbers are indicated by the error bars.
We find a notable difference between the presented PSs of the LUCAS sample data. The OAs for all the experiments range from 78.1% to 93.6%. On average, the standard deviation for all experiments is 0.57% (max. 0.76%).
When comparing the positioning approaches GPS and GRID, it can be seen that the latter leads to a consistently higher overall accuracy for all experiments. The increases in OA vary depending on the PS, from 2.0% for PS 2 to 4.7% for PS 0 . An ANOVA and a subsequent Tukey-Duckworth test quantify the difference between the positioning approaches at 3.7% for the OA.

Table 3
Overview of the image layers calculated from Sentinel-2 mosaics and auxiliary data. Focal statistics and Gray Level Co-occurrence Matrix (GLCM) were derived in three kernel sizes.

Application of the newly developed PS: large-area land cover classification
After assessing the overall model performances, we predict the model created PS 3 and GRID locations onto the image data for the federal territory of Germany. Fig. 3 depicts the complete classification result and Table 4 presents a detailed estimated error matrix. The map illustrates the heterogeneity of landscapes in Germany at different scales. On the national level, a trend of low vegetation in the north and higher vegetation in the south of the country is outlined. Landmarks like the Central German Uplands landscape are clearly discriminated by dense tree vegetation. Detailed views on the metropolitan area of Berlin in Fig. 3 expose a complex web of interconnected urban patches, towns, and villages with intermediary croplands and meadows. Furthermore, the detailed magnification shows a pattern of fine structures like individual roads and different vegetative covers (e.g., arrangements of broadleaved and coniferous forest) being discriminated. Even in densely built areas, small patches of urban green are recognized by the classifier in the high-resolution imagery. Overall, the classification reaches an estimated accuracy (Ô ) of 93.07% (SE = 0.43, see Table 4).
In terms of class-based accuracy, most classes achieve a fairly sufficient agreement in estimated user accuracy ( > U 84%). The estimated producer accuracy, in turn, achieves high values for most classes ( > P 90%). Open soil and water, however, achieve a significantly lower estimated producer accuracy of 39.78% (SE = 10.18%) and 68.94% (SE = 4.91%), respectively, indicating that there are higher omission errors for these LC types. While the inter-class scattering is mostly Fig. 3. Overview of the land cover map produced using Sentinel-2 imagery and LUCAS samples that was pre-processed according to PS 3 . Zoom windows show the level of detail ranging from the landscape view, urban structure, and fine grain land cover patterns.

Table 4
Estimated error matrix for the final classification with estimates for overall accuracy (Ô ), user accuracy (Û ), and producer accuracy (P ) (see Stehman and Foody (2019)  M. Weigand, et al. Int J Appl Earth Obs Geoinformation 88 (2020) 102065 between vegetated classes, artificial land shows a slight overlap with all other classes.

LUCAS sample pre-processing
The experiments in this study reveal that the semantic pre-processing of LUCAS reference data has a strong impact on the accuracy of the classification results. That being said, a direct comparison between the PSs is not justified. All PSs are developed to fit a specific purpose of pan-European or national LC mapping. They differ in sample selection and semantic class hierarchy. Their accuracy depends on the number of classes and validation samples used to assess the classification model. This results in different conditions between the PSs, rendering a direct comparison impossible. In general, a PS with fewer target classes achieves a higher accuracy, a trend that has also been observed in other studies (Ma et al., 2017). This effect is likely because fewer classes exhibit more distinct spectral signatures, and thus, classification schemes with more target classes are more prone to misclassification.
Of the three pre-processing schemes, the results of PS 2 show the highest rate of misclassification. It is likely that this is due to the large number of LC classes (12) and relatively few remaining samples (≈14,600, cf. Table 1). This PS was originally developed by Pflugmacher et al. (2019) for pan-European classification purposes. In a smaller region, in this case Germany, substantially fewer reference points are available for training the machine learning model. Hence, one restrictive issue for using LUCAS samples as training data for classifying remote sensing imagery is that it can only be successfully applied at the regional or national scale. The smaller the area of interest, the more unlikely the LUCAS samples are to be suitable for creating a stable model (cf. Leinenkugel et al., 2019). Therefore, in some studies LUCAS was used in combination with other data (e.g. Griffiths et al., 2019), while others collected samples using crowdsourcing (Bayas et al., 2016).
Overall, all applied PSs yield LU/LC classifications with high semantic accuracy. The application of PS 1 and PS 2 , which were originally developed for lower resolution Landsat imagery, demonstrate similar accuracies in comparison to the respective original works. This highlights the transferability of the existing approaches for the classification of Sentinel-2 imagery. With PS 3 , we introduce a restructured class hierarchy which describes LU/LC classes that can be classified with high semantic accuracy.

LUCAS sample positioning
The results in this study stress the fact that locating LUCAS samples for remote sensing analyses is neither intuitive nor trivial. While GRID locations must be acquired separately or reconstructed from the LUCAS-ID, GPS locations are provided alongside the data and therefore appear more accessible to the user. However, the chosen positioning method has a major impact on the classification accuracy for all experiments (see Fig. 2). It is evident that choosing the theoretical location of the points (GRID) over the provided GPS location improves the classification accuracy regardless of the PS. The distance between the two positioning approaches, GPS and GRID, measures only a few meters in most cases. However, this undirected shift induces measurable classification errors especially in high resolution imagery. The strength of this effect is dependent on the chosen pre-processing scheme even after applying a maximal observation distance for GPS locations. For example, the effect between GPS and GRID experiments is the smallest in PS 2 because there are already strict rules in place concerning the distance between the observer and the observed point. The positioning pitfall is most prominent with samples of the class water areas (G), which are typically located on the waterfront but not in the water. With the original GPS location, the spectral feature space is distorted.
Excluding information about these areas based on the observation distance (i.e., the distance between the GPS point and the original GRID point) ignores the fact that the information can be confidentially registered from the shore, even though the distance may be particularly large. As the positioning approach has a major impact on the classification accuracy in high-resolution remote sensing applications, we suggest using the GRID locations. Thus, no selection of samples based on a maximum observation distance must be applied, and more LUCAS samples are available for the classification process.

Image classification
Combining in-situ data from the LUCAS database with satellite imagery can lead to several common errors. For example, effects like shadowed areas represented in the image are not anticipated in the LUCAS class hierarchy. Additionally, especially with increasing spatial image resolution, urban environments featuring many high buildings and street canyons can be falsely assigned as water areas due to their spectral similarities. Furthermore, narrow roads overgrown by trees that are clearly recognizable by a human observer on ground are not identifiable as such from a bird's eye view; hence, they are likely to be misclassified.
Classes with few samples in their stratum, for instance open soil, are prone to a greater number of classification errors. The error matrix shows that this specific class was misclassified at a higher rate than that of the other classes. This may originate from a smaller sample size and a spectral similarity to the other classes such as urban fabric.
In this study, 229 spectral, textural, and auxiliary image layers were used for image classification; the images had a compressed data volume of 2.5 terabytes. Although providing promising results, these results come at a high computational cost and require expertise in big geographical data analysis. Therefore, feature selection techniques can be applied to identify the most discriminatory image layers, thereby reducing the computing effort.
Applying the pre-processing scheme proposed in this study to the image data for national land cover classification shows that the LUCAS samples do in fact represent a suitable database for high-resolution land cover mapping with Sentinel-2 data. Landscape characteristics can be identified at the local to national scale with high accuracy. The final LC map provides an unprecedented foundation for future applications which profit from the high spatial resolution, such as interdisciplinary analyses of human habitats for environmental justice (cf. Weigand et al., 2019).

Conclusion
In this study, in-situ sample data from the pan-European LUCAS were used as ground truth data for high-resolution remote sensing LC mapping. In the literature, no standard or generally accepted pre-processing scheme for LUCAS samples currently exists. We therefore investigated LUCAS sample pre-processing in the context of remote sensing image classification. We found possible obstacles in its application and proposed a new scheme dedicated to high-resolution land cover mapping using Sentinel-2 data. Both the proposed and existing preprocessing approaches were applied to high spatial resolution multispectral imagery. Furthermore, we distinguished the location of the LUCAS samples using the originally intended GRID position and the GPS location recorded by the observer. For classification, high-resolution Sentinel-2 spectral values and their derived indices (NDVI, NDBI, and NDWI) were utilized. We also included focal textures and imperviousness data.
The results show that for all experiments, the positioning approach had a significant impact on the accuracy of the final classification results. Choosing GRID locations accounted for a 3.7% gain in accuracy on average. We therefore recommend that any image classification approach relying on LUCAS samples pre-processing retrieves both the M. Weigand, et al. Int J Appl Earth Obs Geoinformation 88 (2020) 102065 originally intended and actually observed locations of the LUCAS sample instead of utilizing only the recorded GPS location. All preprocessing schemes using GRID locations showed overall accuracies of greater than 80%. The results of this study demonstrated that LUCAS insitu data is well-suited for large-scale LU/LC classifications based on Sentinel-2 data that use spectral and textural features. Moreover, preprocessing schemes that were originally developed for the classification of Landsat imagery also provided highly accurate results in higher resolution Sentinel-2 imagery. The proposed pre-processing approach of the LUCAS samples, PS 3 , in combination with positioning the samples in the original INSPIRE locations resulted in a classification with an estimated overall accuracy (Ô ) of 93.07%. This product can be acquired via DOI: https://doi.org/10.15489/1ccmlap3mn39 for further research. This study focused on the classification of imagery at the national scale. In future studies, Sentinel-2 imagery can be deployed at larger scales to derive high spatial resolution LU/LC information for Europe, similar to what was accomplished with Landsat imagery by Pflugmacher et al. (2019). Making use of the longitudinal character of the LUCAS program, updates to the area-wide classification can be obtained using LUCAS and Sentinel-2 data from 2018, allowing for the analysis of LU/LC changes. Big data processing infrastructures (e.g. Gorelick et al., 2017) can be utilized to automate the workflow. Furthermore, large reference data archives (Fritz et al., 2017) suggest the potential for high-resolution mapping at the global scale.

Funding
This study was partly funded by the German Federal Environmental Foundation (DBU). Funding for the projects "meinGruen" and "SAUBER" (funding codes 19F2073B and 19F2064B, respectively) were granted by the German Federal Ministry of Transport and Digital Infrastructure (BMVI), and funding for the project "Monitoring des Stadtgrüns (Monitoring of urban green)" was granted by the German Federal Institute for Research on Building, Urban Affairs and Spatial Development (BBSR). The funding sponsors had no role in the study design, interpretation, manuscript writing, or publishing decision.