Reliability and consistency assessment of land cover products at macro and local scales in typical cities

ABSTRACT Urban areas have higher heterogeneity compared to natural areas, it is crucial to assess fine-resolution land cover products and discover how they differ in urban areas so that they can be efficiently used for various application scenarios. In this study, five typical cities in China were chosen as study areas to evaluate four commonly used 30 m land cover products: GLC_FCS30-2020, FROM-GLC30-2017, Globeland30-2020, and CLCD-2019. We analyzed the reliability of these four products using validation samples as well as by examining their area and spatial pattern consistency. Given the limitations of traditional accuracy assessments at the macro level, we added a local area evaluation to further examine the classification details in these products. The macro results indicated that four land cover products within urban areas have a similar overall accuracy, surpassing 76%, but there was a low consistency among them, ranging from 42.21% to 61.13%. The local accuracy assessment illustrated that GLC_FCS30-2020 and FROM-GLC30-2017 performed well in reflecting the intricate details of the city, however, the four products exhibited varying degrees of misclassifications and omissions. These phenomena suggest that more sophisticated algorithms are needed to consider urban particularities since fine-resolution land cover products may fail to capture complex urban details.


Introduction
Land use/land cover (LULC) describes the combination of natural and anthropogenic landscapes, which plays a critical role in influencing global change (Tran et al. 2017).Reliable and temporally consistent land cover datasets and their selection can have a significant influence on the outcomes of integrated assessment models used for hydrological circulation, ecosystem service, climate simulation and atmospheric process research (Bontemps et al. 2012;Hibbard et al. 2010;Mora et al. 2014).Additionally, land use/land cover change (LUCC) has a high value in practical applications (Turner, Moss, and Skole 1993).For instance, accurate LUCC data can help establish urban heat island (UHI) mitigation strategies and plan future urban construction (Tran et al. 2017), ecosystem service values (ESVs) can be assessed by using monetary values preassigned to each biome or LUCC type (Song and Deng 2017), and LUCC has a major impact on the persistence of biodiversity worldwide (Mantyka-Pringle et al. 2015).Notably, LUCC is also based on well-classified land cover products.
Based on advanced satellite remote sensing technology and classification algorithms, many highquality global land cover products are produced, such as the IGBP-DISCover product of the USGS (Loveland et al. 2000), the University of Maryland (UMD) land cover product (Hansen and Reed 2000), the GLC2000 product of the European Union Joint Research Center (Bartholome and Belward 2005, 200), the MODIS land cover product of Boston University (Friedl et al. 2010), and the GLOBCOVER product of the European Space Agency (Bicheron et al. 2008).In recent years, due to the rapid development of remote sensing technology and the continuous improvement of computer computational processing capabilities, the spatial resolution of land cover products has gradually increased from a kilometer-level to a higher resolution meter-level (30 and 10 m).The 30 m resolution products include the Globeland30 land cover products of the National Basic Geographic Information Center of China, which adopted a classification strategy based on the 'pixel-objectknowledge' rule (Chen et al. 2015).The other three commonly used products are the GLC_FCS30 products of the Chinese Academy of Sciences (Zhang et al. 2021), the FROM-GLC30 products generated by Tsinghua University (Chen et al. 2019) and the CLCD product produced by Wuhan University (Yang and Huang 2021).All three products used the random forest algorithm to add flexible thematic detail.These four high-resolution land cover products can basically support relevant quantitative studies, providing statistics on specific land cover types (Yu et al. 2013) and land cover changes and predictions on a regional scale (Rimal et al. 2018).Before implementation, it is imperative that land cover products are rigorously and comprehensively assessed in terms of data quality.
In environmental and geological research, the accuracy of land cover products is crucial.Using sample validation and consistency analysis, numerous accuracy evaluations of land cover products have been carried out throughout the past decades.Most validation datasets within the regional study area are collected via field surveys by GPS or visual interpretation on Google Earth (Kang et al. 2020;Liang et al. 2019), which may require more effort (labor).Google Earth provides a high-resolution image repository and convenient operation for obtaining validation samples.Another strategy for sample accuracy assessment is to use existing global or regional validation datasets such as Geo-Wiki, GLCVSS and LUCAS (Gao et al. 2020;Wang et al. 2022).Consistency analysis, as opposed to sample accuracy evaluation, focuses on substantial variances across products to assess and determine the application value of these products in specific areas.International comparative studies on consistency are well established and highly comprehensive.An evaluation index system for land cover product consistency analysis has been gradually constructed, including comparisons of land cover type areas, error matrix statistics and spatial pixel confusion matrices (Liu et al. 2021;Tchuenté, Roujean, and De Jong 2011).
However, the current accuracy assessment and consistency investigation of land cover products often focus on the overall region or natural areas (Liang et al. 2019;Liu et al. 2021;Wang et al. 2022) and lack validation within urban areas.The expansion of urban areas and land cover changes represent the degree of the temporal and spatial impacts of the continuous construction and modification activities of humans on the Earth system (Seto, Güneralp, and Hutyra 2012;Zhao et al. 2006).The complexity and diversity of the land cover classification in urban areas are much greater than those in most natural areas due to the high spatial heterogeneity of urban substrates.Therefore, high-resolution land cover products are of great value in urban area studies.Change analysis and prediction of LULC scenarios under future urban growth can be performed using mathematical modeling based on land cover products, and the outcome will help land use managers and urban planners make important decisions on better land use management (Koko et al. 2020;Onilude and Vaz 2021).The significance of accurate urban land cover products is also reflected in the influence on UHIs.Different urban land cover types have different thermal contributions to UHI effects, based on a quantification of the benefits from spatial aggregation (Yue et al. 2019;Zhao et al. 2020).For classification purposes, studies related to the specific orientation of urban areas need accurate land cover data.By indicating the map accuracy, the map producers aim to inform users of any product limits to encourage proper map use (Latifovic and Olthof 2004).
Compared with accuracy assessments of the study area as a whole, increasing the focus on local areas can ensure that regional heterogeneity is considered.After quantitative evaluation of the accuracy of the large-scale land cover dataset based on the subfractional confusion matrix, the misclassification of impervious areas as grasslands, croplands, water bodies and shrubs is detected in time, and the detailed phenomena are better described (Luyuan et al. 2015).In addition, when conducting assessments at the scale of urban agglomerations, detailed characteristics and differences of specific types can be obtained by combining qualitative comparisons at the global and local scales (Qiu et al. 2021).These studies affirm the necessity of local evaluation in the accuracy evaluation of land cover products.For regions with high heterogeneity, such as cities, wetlands, and fragmented mountains, a detailed local evaluation can prevent inflated accuracy evaluations.Therefore, thorough global and local comparisons should be used together to assess the reliability of land cover products in urban areas.
In light of the above issues and considerations, this study selected five cities, namely, Beijing, Nanjing, Guangzhou, Chengdu and Urumqi, as study areas and evaluated four sets of 30-m resolution land cover products, GLC_FCS30-2020, FROM-GLC30-2017, Globeland30-2020 and CLCD-2019, in terms of basic accuracy assessment, consistency analysis and local detail evaluation.In the process of evaluating these commonly used, high-resolution land cover products, we expected to identify the advantages and disadvantages of using these products for urban areas.To support the improved creation and application of land cover data in urban areas, we will also provide valuable feedback to land cover product producers and users by summarizing the causes of the problems.

Study area
Climate, soil, landforms, and other natural environmental conditions have a major impact on land cover.Within the same comprehensive physiographic zone, the physical geographic conditions are more similar, and thus, the composition and structure of the land cover are more consistent.This study integrated the richness and typicality of natural regions and their climatic zones as well as land cover types and selected five cities in China, namely, Beijing, Nanjing, Guangzhou, Chengdu and Urumqi, as study areas to evaluate four commonly used, domestic, 30-m resolution land cover products.In terms of their natural regions, Beijing is in northern China, Nanjing is in eastern China, Guangzhou is in southern China, Chengdu is in central China and Urumqi is in northwest China.According to the Köppen climate classification (Köppen 1884), Beijing is in a hot summer continental climate zone, Nanjing is in a humid subtropical climate zone, Guangzhou and Chengdu are in the dry winter subtropical climate zone, and Urumqi is in the cold desert climate zone.Figure 1 shows the geographical distribution of the five study areas and their climatic zones.

Data and preprocessing
We selected four commonly used 30 m land cover products covering the five Chinese cities for accuracy assessment and consistency analysis.The Globeland30-2020 product included the land cover for Antarctica (it was omitted in the previous version).According to the sample evaluation results based on the landscape shape index sampling model, the overall global accuracy of the Glo-beland30-2020 product improved by 2.2%.The FROM-GLC30-2017 product was based on the first multiseasonal sample library for training (Li et al. 2017), which reduced the impacts derived from seasonal change.To address the issue of spatial transition discontinuities in the original products, the GLC_FCS30-2020 product made substantial use of extensive expert knowledge and auxiliary data.The CLCD-2019 was a land cover product at the Chinese scale, with the advantage of only a one-year temporal resolution.This product simultaneously released land cover data from 1990 to 2019, which was more conducive to the analysis of LUCC studies.Table 1 shows basic information for the four products.
Since the projection, spatial resolution, spatial extent and classification systems of the four products were not identical, the data needed to be preprocessed before a unified accuracy evaluation could be carried out.We first reprojected the land cover datasets to the UTM projection and cropped the datasets based on the city boundary.Then, we used plural sampling methods to resample all the datasets to a spatial resolution of 30 m.As the four land cover product classification schemes varied and their descriptions did not exactly correspond to the same categories, using a shared validation dataset for direct comparison was not sufficient for their assessment.Therefore, conversion of the classification systems was required (Chen et al. 2019; Yang and Huang 2021; Chen   Zhang et al. 2021;Buchhorn et al. 2020).According to the user guidelines of these products and the widely used IGBP system, nine significant land cover categories were designated.
Table 2 shows the conversion of the classification systems for the four products.

Validation samples
To obtain a set of validation samples with a good spatial distribution and high compatibility, we used the stratified random sampling method because it offers a good balance of statistical rigor and practicality.Figure 2 shows the flow of the stratified random sampling scheme and an example of the interpretation of the sample on Google earth.In this sampling method, the areal proportion of each land cover type in the study area determines the sample size of each stratum.Therefore, we used the higher resolution ESA-WorldCover data as reference data to estimate the area and proportion of each land cover type in the study area.Then, a statistical method based on polynomial distribution was used to determine the overall sample size of each study area.The calculation results pertained to the minimum number of samples that must be obtained, which was the fundamental presumption to guarantee that the sample was fully representative.The specific equation is as follows: N is the sample size; W i is the proportion of area in type i that is closest to 50% of the area in all land cover types; b i is the allowable type i error; B is the (b/k) percentile with 1 degree of freedom that obeys the x 2 distribution, which is shown in the x 2 distribution table with 1 degree of freedom.
According to the polynomial distribution statistical method and the proportion of the area within each study area that was closest to 50%, the initial overall sample size for each of the five cities was set at 800.Next, a certain number of samples were randomly selected in proportion to the area in different land cover types.Considering the need to further validate the sample and to supplement the sample points for land cover types with a disproportionately small area, ground truthing of these samples was conducted based on visual interpretation from Google Earth images, Sentinel images, and other third-party validation samples.Given its enriched time series, high resolution, ease of access, and extensive coverage benefits, Google Earth is the most frequently utilized data source for accuracy evaluation (Wang et al. 2022;Zhao et al. 2014;Meng et al. 2015).In addition, we adhered to the following guidelines while choosing and interpreting the samples to minimize the detrimental effects of potential placement and interpretation errors on the overall sample quality: (1) Sampling points were selected from the centers of 210 × 210 m homogenized areas because Google Earth has a positioning error of approximately 15 m and the spatial resolution was 30 m. (2) To minimize interpretational errors due to the temporal phase, we relied on multitemporal data from previous years while primarily using remote sensing images from 2017 to 2020 that were temporally consistent with the four products.(3) The interpretation was supplemented by adding references to additional data such as volunteered geographic information for samples that were more challenging to understand.
One concern was that there may have been minor discrepancies owing to the different base years (2017, 2019 and 2020) for the land cover in the same study area, which may have resulted in different land cover types being interpreted for the same site samples.However, based on the fact that only 8% of the land cover totally changed over a decade in regions where the land use is shifting rapidly, the validation sample was less likely to change over a four-year period, and the resulting error was quite small, between approximately 0.32% and 3.20% (Wang et al. 2022;Yang and Huang 2021).This indicated that the validation sample sets collected for this study can be accepted in their entirety for different products.Figure 3 depicts the distribution of sample points, while Table 3 lists the total number of sample points for each city by type.

Confusion matrix
Due to its excellent visualization capability, easy operability and ability to explicitly compare different categories, the confusion matrix is one of the most essential tools for comparing the quality of multiple land cover products (Kang et al. 2020;Canters 1997).Based on reference data, the confusion matrix aggregates the number of correct and incorrect classification results and breaks them down by each land cover type.Different shades of color are used to indicate the number of sample points in the confusion matrix so that the extent of misclassification for different land cover products can be visualized.In addition, some frequently used measures derived from the confusion matrix can verify the reliability of the product from different perspectives.From a cartographic perspective, the producer accuracy (PA) refers to the agreement between land cover products and actual land cover types; the user accuracy (UA) shows the usability of the product from the perspective of using the maps; the overall accuracy (OA) measures the overall type consistency between the product and the real land cover distribution, and the Kappa coefficients  compensate for the changes in the metric ratios due to small variations in the number of pixels of each type.The following formulas may be used to calculate each indicator (Fung and LeDrew 1988): where r denotes the number of land cover types, n denotes the total number of sample pixels in the study area, x ii denotes the number of pixels that have been classified correctly as land cover type i, x i + denotes the number of type i pixels in the validation sample sets, and x + i denotes the number of type i pixels in the reference data corresponding to the sample set.

Area composition consistency
We first counted the area composition per land cover type for the four land cover products within each study area and presented them in the form of bar charts.The different color bars represent different products.Through these bar charts, we mainly focused on whether the area proportion of each type was consistent with reality, and we qualitatively compared the similarities among the different products.The correlation coefficient was then used to quantitatively assess the similarity of various products based on the land cover type area.The formula is as follows: where k is the land cover type, R XY represents the area correlation coefficient between the X product and Y product, X k denotes the representation of the entire area of a k type in the X product, Y k has the same meaning as X k but for the Y product, X represents the average of the entire area in the X product, Y has the same meaning as X for the Y product, and n denotes the number of land cover types.

Spatial pattern consistency
The spatial consistency was described by comparing whether the land cover types indicated by various land cover products at the same location were identical.A spatial overlay analysis of the products was performed, pixel-by-pixel statistics were calculated, and the statistical results were displayed in the form of maps by land cover type.To some extent, the spatial consistency assessment can be used to explore the degree of similarity among the products in a more intuitive and visual way (Gao et al. 2020;Bai et al. 2014).Since the distribution of land cover types with areas that are too small cannot be effectively displayed, we selected the four land cover types with the largest areas in each city (Hu et al. 2015).Using cropland as an example, the degree of consistency can be divided into five levels in descending order: (1) reappearance number 4: all four products showed that a given pixel was cropland; (2) reappearance number 3: three products showed cropland at the given pixel; (3) reappearance number 2: two products showed cropland in their corresponding image elements; (4) reappearance number 1: only one product showed cropland at the given pixel; and (5) reappearance number 0: none of the four products showed cropland at the given pixel.
Based on the analysis of a single land cover type, a further exploration of the consistency of all types in the products can be investigated.From high to low, there were four levels of consistency at a given pixel: (1) completely consistent: all four data products showed the same land cover type; (2) high consistency: three products showed the same type; (3) low consistency: only two products showed the same type; and (4) completely inconsistent: the four land products all had different types.

Local area detail evaluation
During the visual comparison of the land cover products, some local obvious misclassification phenomena were found.The most obvious results were that the FROM-GLC30-2017 classified the shadows that were cast by taller buildings as water bodies, and the forests in urban areas were misclassified as cropland, which generally occurred more often in urban centers.These typical areas were selected, and the four products were compared with the more accurate ESA-10 product (Zanaga et al. 2021) (https://esa-worldcover.org/) and Google Earth base maps.For a more visual and quantitative presentation, a certain number of sample points were randomly selected in the area to assess the four products.The misclassifications of the sample points were counted, and the OA of the four products was calculated.Since impervious areas are the most predominant land cover type in these regions, the precision of the impervious area sample points was added as a metric to screen the misclassifications of the products.The impervious area precision of a product was defined as the ratio of the number of impervious area samples that were correctly classified to the overall number of samples that were identified as impervious areas in that product.

Accuracy analysis based on the confusion matrix
Figure 4 displays the detailed confusion matrixes for the five selected study cities of the four land cover products, which can illustrate the misclassification of samples with each land cover type.Table 4, which presents a visual comparison of the accuracy performance for the various products in the five cities, includes the overall accuracy and Kappa coefficients.According to this figure and table, the accuracy of each land cover product was in accordance with its own description, and the difference between the best-performing and worst-performing land cover products was no more than 10%.Additionally, in contrast to the three southern cities, the four products were more precisely classified in Beijing (79.11%−87.02%)and Urumqi (74.30%−82.51%)(Table 4).Based on the confusion matrices, it can be seen that in Beijing, the problem with the FROM-GLC30-2017 product is that some impervious area samples are divided into grassland and cropland, and a large number of grassland samples are divided into cropland.In Nanjing and Chengdu, all four land cover products misclassified forestland as cropland, especially the GLC_FCS30-2020 and CLCD-2019 products, which misclassified over 50% of the forestland sample points as cropland.In Urumqi, the Glo-beland30-2020 product misclassified over 66% of the bare land samples as grassland.
Considering the validation samples from the five selected study areas as a whole, we statistically obtained four confusion matrixes (Figure 5) representing the four land cover products.In addition, we compared the UA and PA for different land cover types with the table statistics (Table 5).The overall accuracy of all four products exceeded 75%, and the best performing product in terms of overall accuracy was the CLCD-2019 (78.48%).The Globeland30-2020 product was a close second, with an accuracy of 77.15%.However, the confusion matrix clearly showed that there was more obvious cropland-woodland-grassland confusion in both products and a very low UA for wetlands, bare lands and shrubs.The GLC_FCS30-2020 product has an overall accuracy of 76.78%, and the advantage of that product is that it slightly outperformed the other products with an accuracy of over 15% for both wetland and shrub samples.The FROM-GLC30-2017 product had an overall accuracy of 76.28%, but its impervious area type accuracy was below 80% and it was misclassified more often as bare land, cropland, grassland and water.In summary, all four products had high accuracies for the cropland, forest, water, impervious area and permanent snow land cover types.These land cover types have relatively distinct spectral and textural characteristics, as well as being relatively independent in their spatial distribution and large area range, which are easier to discern on remote sensing imagery.

Comparative analysis of land cover distribution
The area distribution by land cover type in the five selected cities for the four products is shown in Figure 6, and the correlation coefficients among the four products are summarized in Table 6.Generally, the land composition characteristics of each city should be described essentially in the same way by the different products.Beijing is dominated by grassland, cropland, impervious areas and forests, and there is a difference of more than 8% in the cropland and forest coverage between the GLC\_ FCS30-2020 and CLCD-2019 products.Nanjing is dominated by cropland, impervious areas, water and forests, with cropland being the most dominant land cover type.In the GLC_FCS30-2020, FROM-GLC30-2017, Globeland30-2020 and CLCD-2019 products, the area of cropland accounts for 66%, 52%, 58% and 63% of the total area, respectively, and the GLC_FCS30-2020 and FROM-GLC30-2017 products significantly differ, with a difference of 14%.Guangzhou is dominated by forests, cropland, impervious areas and water.A low consistency is more prominent between the two land cover types with the highest proportions of area (cropland and forest), especially in the GLC_FCS30-2020 and FROM-GCL30-2017 products, which show a difference of approximately 10%.Chengdu is dominated by cropland, forests, impervious areas and grassland, while cropland covers the largest amount of area.The FROM-GLC30-2017 product has a low consistency with the other three products in its area composition of both croplands and forests.Urumqi is dominated by grassland, bare lands, cropland and impervious areas.The most prevalent types of land cover are bare land and grassland.The Globeland30-2020 product differs substantially from the other three products in the area composition of these two land cover types.We suspect that this product erroneously labels significant areas of bare lands as grassland.
The land type compositions of the five study areas are shown in Figure 6(f).To quantify the compositional correlation between products, the correlation coefficient R XY is calculated, and the results are referenced in Table 6.The GLC_FCS30-2020 and CLCD-2019 products are strongly correlated in terms of each kind of land area (0.990), while the FROM-GLC30-2017 and Globeland30-2020 products are poorly correlated (0.953) compared to the other products.

Consistency analysis of spatial patterns
The consistency of land cover products at each raster pixel was counted using a pixel-by-pixel scale spatial overlay method.To obtain the spatial consistency of the predominant land cover types within each city, the four products were first spatially overlaid by the top four types with the greatest areas, and the results are shown in Figure 7.
The forests are mainly distributed in Beijing, Nanjing, Guangzhou and Chengdu, and the spatial consistency is high for this land cover type within each city, particularly in Beijing and the northwestern side of Chengdu.Because forests in mountainous areas are continuous, extensive and evenly distributed, they are quite easy to identify.However, in most of the plain areas, such as the central part of Nanjing and the eastern side of Chengdu (Figure 7(b-d)), the spatial consistency that it useful for forest identification rapidly deteriorates due to the intensive human activities in these areas, where forest patches are finely fragmented and heterogeneous and tend to form mixed types.Cropland is the dominant land cover type in all five cities.Except for Urumqi, there is a high spatial consistency for the cropland land cover type in the other cities among the land cover products, with basically three or more products indicating the presence of croplands at the same locations.As a partly artificial and partly natural land cover type, cropland has obvious spectral and textural characteristics; when classified by remote sensing, this type has clear classification principles and bases, and fewer categories can be confused with it.
The major rivers and lakes in each city are generally consistent with the spatial distribution of water in the products, whereas the places that are inconsistent are mostly found in the tributaries of the rivers according to this research.Overall, the identification of water is highly consistent across the four datasets, with fewer areas of ambiguity.However, water is characterized by seasonal and interannual variability, and inconsistencies are found in the products from one year to the next.Figure 7(b) shows that there is an obvious spatial inconsistency in the large paddy field area on the southwest side of Nanjing.This is mainly because the paddy field area is surrounded by interspersed water and has a lower spectral reflectance than the general cropland, while the presence of dry periods at different times can also cause confusion.The GlobleLand30-2020 product is significantly influenced by a mono-temporal data source and reflects the water condition only at a particular time, which reduces the accuracy of water identification.The degree of consistency is low across products for the grass cover type.Although Urumqi is comprised of more than 40% grassland, the consistency image clearly shows that one product greatly differs from the other three as a result of the mixture of grassland and bare land.Figure 6(e) can be used to confirm this outcome.In the other cities with a higher percentage of grassland, only the northernmost part of the city of Chengdu city has a reappearance count of 4 for the grassland type (Figure 7(d)).In the city of Beijing, several golf courses with large grassland areas are correctly classified only by the Globeland30-2020 product (Figure 7(a)).
Table 7 shows the spatial consistency of the four products across all land types in each study area, with the highest degree of spatial consistency being in Chengdu.The percentage of pixel points that have a 'completely consistent' land cover type among the four products at the same location in Chengdu is 61.6%, while 11.3% of the pixel points have a 'low consistency', and only 0.1% of the pixel points are 'completely inconsistent'.Due to the presence of large areas of Urumqi in the Glo-beland30-2020 product that do not match the types shown by the other three products, it has the lowest percentage of 'complete consistency' at only 42.21%.Compared with the two cities mentioned above, the spatial consistency performance of the four products in Beijing, Nanjing and Guangzhou is more general and does not differ significantly.

Typical error classification on the local area
It was challenging to determine the land cover products that incorrectly identified impervious areas as water when the accuracy of the products was evaluated using a validation sample set at the urban scale (Section 3.1), and the UA of all four products was over 79%.However, there were numerous highly heterogeneous city regions, and the misclassification of these areas was often unnoticed in the largescale verification.To evaluate the four land cover products more comprehensively and objectively, three localized areas within the Chaoyang District of Beijing, the Tianhe District of Guangzhou, and the Shaibak District of Urumqi were selected from the five study areas.These three local areas are all within the central area of their city (size approximately 3 km × 2 km), which has a larger population and higher heterogeneity compared to other areas of the city.The performance of the four products in urban areas can be better investigated by assessing their accuracy in these three regions.
Figure 8 (Beijing), Figure 9 (Guangzhou) and Figure 10 (Urumqi) reveal the local area accuracy assessment findings.The four land cover products were first visually compared with the reference data from the ESA-10 and Google Earth high-resolution maps.Then, based on Google Earth, approximately 200 randomly selected sample points were used to verify the accuracy of the four land cover products and to count the misclassification of the major cover types in the region.Two indicators were selected to evaluate the performance of the products: the overall accuracy and the precision of the impervious area (the land cover type with the largest percentage of area in the regions) of the validation sample in the three regions.
Observational evaluation of the local areas within the study areas revealed that the FROM-GLC30-2017 and GLC_FCS30-2020 products performed well in reflecting the intricate details of the city but they misclassified shadows that were cast on the roads by tall buildings as water more often, particularly the FROM-GLC30 product, in which more than 25% of the impervious area samples in all three local study areas were classified as water or wetlands.Additionally, the    GLC_FCS30-2017 and CLCD-2019 products misclassified vast areas of forested land in the urban center as croplands.The classification characteristics of the other products and their strengths and weaknesses can also be seen.For instance, the Globeland30-2020 product exhibited few of the abovementioned misclassifications in its local evaluation results, but it has categorized all pixels within the research domain into impervious area, with a multitude of misclassified types and the absence of elaborative classification information including roads, buildings and parks.

Misclassification of urban land cover types
The misclassification of land cover types in urban regions can significantly affect relevant quantitative studies, and a detailed local evaluation can help users identify these problems.In the study, a local evaluation determines that in urban areas, the FROM-GLC30-2017 and GLC_FCS30-2020 products misclassify the shadows cast by tall buildings on the roads as water.
Additionally, the GLC_FCS30-2020 and CLCD-2019 products misclassify vast areas of forested land in the urban center as cropland.Although the Globeland30-2020 product exhibits few of the abovementioned misclassifications in its local evaluation results, it cannot recognize fine details in regions of high heterogeneity .These misclassifications that occur in urban areas may predispose land cover products to detrimental effects when they are applied to landscape modeling and quantitative system simulations (Pauleit, Ennos, and Golding 2005).As in the example cited in the introduction, when exploring how urban land cover contributes thermally to the mitigation of UHIs, the result indicates that impervious areas and buildings appear to have a strong warming effect, while water bodies, grassland, and urban trees have different cooling effects (Zhao et al. 2020).Land cover products are also applied in assessing the average provision potential for ecosystem services, and classification errors due to the high incidence of mixed land use categories in urban areas have a great influence on the results and derived green infrastructure planning (García et al. 2020).Therefore, misclassification in urban land cover data will cause errors in regional quantification results and land cover type difference analysis (Grimm et al. 2008).Since it is necessary to ensure that the validation samples are statistically rigorous and can be easily implemented (Stehman and Foody 2019;Tsendbazar et al. 2018), global sample accuracy evaluation and consistency analysis can only describe the land cover products at a macro level, and these problems at the local level, which occur in urban areas, are difficult to detect.Therefore, in addition to macroscopic accuracy evaluation, it is also crucial to discover the details of product misclassification and to conduct local area comparison and accuracy evaluation while ensuring that the validation method is accurate and can be implemented.
To make the conclusions transferable to global quality assessments of the land cover products over distinct urban contexts, we have chosen seven representative metropolises, including New York, Paris, London, Singapore, Shanghai, Tokyo and Sydney, to investigate if the aforementioned characteristic misclassification scenarios occur in the land cover products being assessed by conducting inspections in local areas within the cities.Upon observation, it was found that with the exception of the CLCD classification scope being limited to China, the other three products exhibited varying degrees of the aforementioned misclassification scenarios in places such as Singapore, Tokyo, Sydney, and New York.However, there were no instances of the product misclassifying building shadows as water bodies in the regions of Paris and London.
The occurrence of shadows is a result of light being obstructed by an object (Zhou et al. 2021).After examining the data source and the characteristics of the regions, we found that in areas where the buildings are tall and densely distributed, the area of the shadows cast by the buildings is substantial.Furthermore, when viewed on Landsat multispectral imagery with a 30 m spatial resolution, these shadows exhibit spectral characteristics similar to water bodies, specifically, low reflectance values and a similar trend (Jiang et al. 2014;Zhang, Sun, and Li 2014).In contrast, European cities such as Paris and London have lower building heights and densities compared to large cities in Asia and North America, resulting in fewer misclassifications.

Accuracy and consistency difference factors
The study revealed that the reliability of all land cover products varied over different study areas (Figure 4), and the consistency of spatial distribution was lower, ranging from 42.21% to 61.17% in each city (Table 7), which was often influenced by factors such as data origin, classification systems and categorization strategies.
In terms of data sources, the Globeland30-2020 product was affected by its mono-temporal data source (Zhao et al. 2014), which had a poor ability to recognize water and bare land that was influenced by different seasons and years.Additionally, Globeland30-2020 classified a large area as grassland in Urumqi that was classified as bare land in the other three products (Figure 6).The GLC_FCS30-2020 extracted spectrally homogeneous MODIS-Landsat areas through the global spatial-temporal spectral library (Friedl et al. 2010), filtered land cover heterogeneity from CCI \_LC-2015 products (Defourny et al. 2012), and then obtained a refined training sample library using statistics.Due to its training samples with rich spectral features, GLC_FCS30-2020 more accurately recognized impervious areas and cropland (Table 4).
In regard to classification methods, GLC_FCS30-2020, FROM-GLC30-2017 and CLCD-2019 were generated using random forest algorithms.The random forest algorithm is one of the most powerful and robust supervised machine learning algorithms that is capable of performing both regression and classification tasks and is quite popular in land cover mapping applications (Li et al. 2014).In the Globeland30-2020 product, pixel-and object-based approaches were combined for classification, and the knowledge and rich experience of skilled operators and the support of web services were used to verify the reliability of the results.This developed P-O-K method effectively reduced the uncertainties and errors caused by automatic machine classification.For instance, only the Globeland30-2020 product identified the golf course area within Beijing and classified it as grassland (Feng et al. 2016).
For the classification system, both Globeland30-2020 and CLCD-2019 have primary classification systems with 10 types, while both GLC_FCS30-2020 and FROM-GLC30-2017 have additional secondary classifications.There are 30 types of finer classifications for the GLC_FCS series products and 35 types of finer classifications for the FROM-GLC series products.Due to their precise definitions, transitional or ambiguous land cover types were not easily confused (Figure 4), which was also the reason why GLC_FCS30-2020 and FROM-GLC30-2017 had more classification details than the other two products.However, the different definitions of similar types also influenced the degree of consistency between the land cover products.The Globeland30-2020 product defines shrubs as 'land with shrub cover and more than 30% scrub cover, including montane scrub, deciduous and evergreen scrub, and desert scrub with more than 10% cover in desert areas' (J.Chen et al. 2015), while the FROM-GLC product defines shrubland as having a 'canopy cover of more than 20%' and a height of less than 5 m (B.Chen et al. 2019), so these differences and inconsistencies existed when the classification system was converted to a unified system; therefore, the inconsistencies between different land covers are not necessarily due to the products themselves.
In addition to the three factors above, we also found that the accuracy and consistency were higher for large continuous distributions and clearly defined types such as water, cropland and large areas of bare land and grassland within Urumqi; in contrast, cropland in deserts, paddy fields and impervious areas distributed in complex and fragmented areas were more challenging to distinguish.

Uncertainty and future work
There are still some uncertainties and future perspectives in this study that need to be explored.In one aspect, the selection of the study areas considered a combination of cities with different climatic zones, different geographical locations and rich land cover types.However, there is no in-depth analysis of the different topographical features resulting from different geographical locations and the associated impacts.Therefore, we hope that elevation data from the study area can be used in future work to further analyze inconsistent image elements in various land cover products.Additionally, the current research demonstrated only the first-level classification accuracy of the products, and a finer validation sample can be constructed later to evaluate the reliability of finescale classification products with similar classification systems.
In response to the identified misclassification of land cover types in urban areas, we also offer some suggestions for the improvement of these products.Owing to the data source being orthoremote sensing images, the coverages of different features will overlap on the images, especially in urban areas where trees obscure buildings and shadows obscure roads.Therefore, in urban classification studies, remote sensing images need to be preprocessed or referenced with auxiliary data to determine the specific type of regional land cover.Specific recommendations include spectral feature extraction, spectral correlation matching (Kaviani Baghbaderani et al. 2020), and time series data adjustment by using continuous spectra of hyperspectral data (ZY1E) everywhere for a higher classification accuracy for a small sample regions after decomposition of the hybrid image elements (Lu et al. 2021;Hennessy, Clarke, and Lewis 2020).For example, hyperspectral data are used to distinguish between shadow areas projected onto the road by tall buildings and water bodies since the main cause of confusion is that both have a low reflectance and the spectral trends are more consistent in the multispectral profile.The type of land cover under vegetation shade can also be determined by applying radar data to penetrate the vegetation canopy and regionalize the focus (Walker et al. 2007;Steele-Dunne et al. 2017).High-quality, nighttime-light remote sensing data (NPP-VIIRS, NOAA-20-VIIRS, Luojia 1-01) can identify urban low-intensity lights generated by traffic, smallscale residential sites and landmarks.Proper application of nighttime-light remotely sensed data can easily distinguish low-intensity lights from dark nonurban backgrounds and may be efficiently used to improve the accuracy of impervious area types (M.Zhao et al. 2022;Ou et al. 2019).

Conclusions
This research compared and analyzed the reliability and consistency of four regularly used land cover products at a 30 m resolution for five typical cities in China.The results provide a reference for selecting appropriate land cover data in numerous investigations conducted within urban regions.The evaluated land cover products included GLC_FCS30-2020, FROM-GLC30-2017, Glo-beland30-2020 and CLCD-2019.Through the sample accuracy assessment, we revealed that the overall accuracy of the products was all over 76%, with CLCD-2019 having the highest overall accuracy between 74.18% and 86.10% within the five study cities, but the difference between the four products was minor.However, in the examination of the spatial consistency of the products, the proportion of entirely consistent areas was low, ranging from 42.21% to 61.17%, and land cover types with a propensity for continuous distribution and low heterogeneity were more consistent.Finally, according to the local accuracy assessments, GLC_FCS30-2020 and FROM-GLC30-2017 performed well in reflecting the intricate details of the city, but compared to a higher resolution product at the whole city scale, all four products had varying degrees of misclassifications and omissions.These included the misclassification of the shadows cast by tall buildings on the roads as water bodies in the FROM-GLC30-2017 product, the misidentification of smaller patches of forest as cropland and the absence of detailed characteristic information in the Globeland30-2020.
In general, the accuracy of the four 30 m global land cover products that we assessed in urban areas of China was not as good as that in natural areas, especially for some vegetation types, such as grassland and shrubland.Moreover, despite fine-resolution land cover products being designed to capture complex urban details, the typical misclassifications in local urban areas that were found in our results will cause problems in studies that focus on LULC management and dynamics and relevant quantitative work as well.Given the results of this study that the use of local and macro scales together can lead to a more comprehensive assessment in land cover product accuracy checks, it is suggested to utilize multiple accuracy evaluation techniques to aid in verifying spatial details.Future global land cover products may need separate algorithms for natural and urban areas or more sophisticated algorithms that consider urban particularities.

Figure 1 .
Figure 1.Locations of the studied cities in China.The climate zones in the background are from the Köppen climate classification, and the legends represent the climate classification number.The first letter of the climate classification indicates the main climate: A = tropical, B = arid, C = warm temperate, D = snow and E = polar.The second letter indicates the precipitation: W = desert, S = steppe, f = fully humid, s = summer dry, w = winter dry and m = monsoonal.The third letter indicates the temperature: h = hot arid, k = cold arid, a = hot summer, b = warm summer, c = cool summer, d = extremely continental, F = polar frost and T = polar tundra.

Figure 2 .
Figure 2. Stratified random sampling process and interpretation of samples for each land cover type on Google Earth.

Figure 4 .
Figure 4. Confusion matrix of the four land cover products in the five selected study cities: (a) Beijing, (b) Nanjing, (c) Guangzhou, (d) Chengdu, (e) Urumqi.

Figure 5 .
Figure 5. Confusion matrix of the four land cover products with the all samples.

Figure 6 .
Figure 6.Area comparisons of the four land cover products in five study areas: (a) Beijing, (b) Nanjing, (c) Guangzhou, (d) Chengdu, (e) Urumqi, (f) Sum of sample points in five cities.

Figure 7 .
Figure 7. Distribution of the product agreement degree for main land cover types in the five selected cities; (a) Beijing, (b) Nanjing, (c) Guangzhou, (d) Chengdu, (e) Urumqi.

Figure 9 .
Figure 9. Local comparison of four products and reference data in Guangzhou (HD: High Definition, OA: Overall Accuracy, IP: Impervious area Precision.Number of random samples: Impervious area: 210; Forest: 65).

Figure 10 .
Figure 10.Local comparison of four products and reference data in Urumqi (HD: High Definition, OA: Overall Accuracy, IP: Impervious area Precision.Number of random samples: Impervious area: 225; Forest: 94).

Table 1 .
Basic information on the four land cover products used in this study.

Table 2 .
The products' classification systems and the correspondence with the combined one we utilized for this research.

Table 3 .
Number of validation samples in the five selected cities.

Table 4 .
Overall accuracy and Kappa coefficient of the four products in the five selected study cities.

Table 6 .
The area composition correlation coefficients between the four land cover products.

Table 5 .
Producer's accuracy, User's accuracy and Overall accuracy of the four products.

Table 7 .
Consistency of all land cover types from different land cover products.