Mapping the maximum extents of urban green spaces in 1039 cities using dense satellite images

Spatial data of urban green spaces (UGS) are critical for cities worldwide to evaluate their progress towards achieving the urban sustainable development goals on UGS. However, UGS maps at the global scale with acceptable accuracies are not readily available. In this study, we mapped UGS of all 1039 mid- and large-sized cities across the globe in 2015 with dense remote sensing data (i.e. 51 494 Landsat images) and Google Earth Engine (GEE) platform. Also, we quantified the spatial distribution and accessibility of UGS within the cities. By combining the greenest pixel compositing method and the percentile-based image compositing method, we were able to obtain the maximum extent of UGS in cities while better differentiating UGS from other vegetation such as croplands. The mean overall classification accuracy reached 89.26% (SD = 3.26%), which was higher than existing global land cover products. Our maps showed that the mean UGS coverage in 1039 cities was 38.46% (SD = 20.27%), while the mean UGS accessibility was 82.67% (SD = 22.89%). However, there was a distinctive spatial equity issue as cities in high-income countries had higher coverage and better accessibility than cities in low-income countries. Besides developing a protocol for large-scale UGS mapping, our study results provide key baseline information to support international endeavors to fulfill the relevant urban sustainable development goals.


Introduction
Over half of the world's population lives in cities and more are expected in the future. The proportion of urban population is predicted to increase from 55% to 68% between 2018 and 2050 (United Nations 2019). With the rapid urbanization, more people benefit from the convenience of city life, but many are also subjected to health risks posed by urban environmental problems (Yang et al 2014). Urban green spaces (UGS) could alleviate some of these risks by providing ecosystem services such as regulating local climate (Zhang et al 2017), removing air pollutants (Lee et al 2021), reducing stormwater run-off (Szota et al 2019), and providing recreational opportunities (Haase et al 2014). Due to the importance of UGS, providing universal access to urban residents has been established by United Nations as a target of the sustainable development goal 11 (United Nations 2015a). Therefore, spatial data of the extent and accessibility of UGS are critical for city governments to gauge their progress towards achieving the relevant sustainable development goal.
Spatial data of UGS become increasingly available in certain parts of the world due to the proliferation of remotely sensed data. Spatial data of UGS at the city level have increased the most. Many cities now have their UGS mapped, such as Atlanta in the United States (Miller 2012), Beijing in China , and Leipzig in Germany (Haase et al 2019). In comparison, spatial data of UGS at the regional level increased modestly. At the country level, UGS in major cities in the United States and China have been mapped multiple times (Nowak and Greenfield 2012, Zhao et al 2013, Yang et al 2014, Kuang and Dou 2020. At the continental level, UGS of 386 European cities in 2000 (Fuller and Gaston 2009) Corbane et al (2020) investigated greenness changes between 1990 and 2014 in 10323 urban centers by using NDVI as an indicator of greenness. However, using NDVI as an indicator of greenness cannot determine the specific extents of UGS and mix UGS with other vegetation types (e.g. crop).
Whereas spatial data of UGS are becoming more available, data at a global scale are still in urgent need for four main reasons. First, definitions of urban areas, data sets, mapping methods, and time frames used in different UGS studies varied greatly Gaston 2009, Nowak andGreenfield 2018), which makes it unsuitable to combine different UGS products to infer the global pattern or to make comparison among cities. Second, it is challenging to map UGS accurately and consistently across large scales due to the spectral similarity of different vegetation types (e.g. tree/shrub versus cropland) and seasonal changes of UGS coverage (Degerickx et al 2020, Yao et al 2020. For example, the UGS coverage reaches the highest value at the peak of the growing season (Yao et al 2020), while the value changes in different seasons, especially for the deciduous urban trees. Thus, a better way to compare UGS coverage among cities is to compare the maximum extent of UGS coverage of each city. Third, a geographic coverage bias exists in UGS studies. Cities in Europe, Asia, and North America have been mapped more extensively than cities in Africa and South America (Kabisch et al 2015). Finally, UGS has not been targeted as an individual class in global land cover or land use mapping efforts. For example, several global land cover products have been developed at both fine and coarse resolutions, such as the 30 m FROM-GLC product (circa 2010, 2017) (Gong et al 2013(Gong et al , 2019, 30 m GlobeLand30 product (circa 2000, 2010, 2020) (Chen et al 2015), and 300 m ESA Land Cover CCI product (ESA-CCI) from year 1992 to 2019 (Kirches et al 2017, Defourny et al 2019. These global land cover products were generated with high consistency in mapping approaches. However, the aim of these products is to provide information of the entire Earth surface with an acceptable accuracy, e.g. 64.9% in Gong et al (2013) and75.4% in Kirches et al (2017). Due to the high heterogeneity of urban landscape, it is reasonable to suspect that UGS will be one of the classes with high classification uncertainty in these global land cover products.
In this study, we aim to analyze global patterns of UGS coverage and accessibility through developing a new framework for mapping UGS in cities across the globe in a more accurate and comparable way using dense satellite images. Specifically, the objectives of this study include: (a) to develop a protocol that allows for large-scale UGS mapping by combining the greenest pixel compositing method and the percentile-based image compositing method based on the GEE platform; (b) to create a database of UGS in global cities using the developed protocol and compare the result to existing global land cover products, and (c) to describe the patterns of UGS in cities across the globe.

Study area
All mid-and large-sized cities, i.e. cities with a population of 500 000 and more, were selected as study areas (figure 1). According to the UN's 'World urbanization prospects: The 2014 Revision' report, there are 1039 mid-and large-sized world cities (United Nations 2015b). We focused on mid-and large-sized cities for two reasons: (a) these cities host a bulk share of the world's urban population, i.e. 51% of the total urban population, and (b) residents in these cities rely more on UGS for their wellbeing than their counterparts in small cities, where accesses to surrounding natural environments are more feasible (Oliveira et al 2014). A list of the 1039 cities can be found in supplementary data 1.

Delineating boundaries of urban areas
The first step of mapping UGS at the global scale is to determine the boundaries of urban areas in a consistent manner. Liu et al (2014) showed that definitions of urban areas varied significantly in different parts of the world. The definitions of administrative boundaries vary in different countries (Cohen 2004). Therefore, it is not feasible to make cross-country comparison of UGS using administrative boundaries to derive UGS coverage. To generate a product that is comparable across the globe, this study defined the urban areas as zones enclosed in boundaries of continuous built-up areas.
An object-based urban thresholding method was used to extract out urban boundaries from nighttime lights data and urban reference data (Zhou et al 2015, Xie and Weng 2017). Rough built-up boundaries were first manually delineated on Google Earth (GE) for each city to reduce the amount of data processing. The rough built-up boundaries must be larger than the real built-up boundaries but are within the administrative boundaries of studied cities (see an example in figure S1 (available online at stacks.iop.org/ERL/16/ 064072/mmedia)).
The monthly NPP/VIIRS nighttime lights data for the year 2014 and 2015 were downloaded from the website of the US National Geographic Data Center (https://ngdc.noaa.gov/eog/download.html). The yearly mean value of VIIRS data was calculated and the rough built-up boundaries were used to clip out each city. The Global Human Settlement Layer (GHSL) data (Freire et al 2016) for the year 2014 was downloaded from the website (https: //ghslsys.jrc.ec.europa.eu/datasets.php) and used as urban reference data.
Then the Marker-controlled watershed segmentation algorithm (Zhou et al 2015, Xie andWeng 2017) was used to segment the VIIRS nighttime data of the year 2014. Using the 2014 GHSL data as the reference data, an object-based optimal threshold for each city was derived to classify the segmented VIIRS data in 2014 into urban and non-urban areas. The derived optimal threshold was used to classify the segmented VIIRS data of the year 2015 into urban and nonurban areas. The urban and non-urban areas were converted into shapefiles. For each city, the largest urban patch was selected, and holes inside the patch were filled to generate a continuous built-up area for each city. The boundary of this area was used as the boundary of the urban area for this city (supplementary data 2).

Mapping urban green spaces
A protocol was developed to map UGS in the 1039 cities (figure 2). To obtain UGS maps of each city, we first map land covers of each city with Landsat satellite images and training samples. Then UGS cover maps of each city were extracted from the land cover maps within derived boundaries.
The 30 m Landsat Collection 1 Level-2 satellite images, which are atmospherically corrected surface reflectance data produced by the United States Geological Survey, were used for land cover classification. All Landsat satellite images were available for analysis on GEE platform (Gorelick et al 2017), and both Landsat 7 ETM+ and Landsat 8 OLI/TIRS images were used in this study. The Landsat images with more than 70% cloud cover and images taken in winter were excluded. All other available images covering the 1039 cities in the year 2015 were used for further processing. For cities where images from the year 2015 were not adequate for generating good quality composited images due to cloud contamination, images from the year 2014 and the year 2016 were also used. In the end, 51 494 images (with 7548 images from the year 2014, 36 404 images from the year 2015 and 7542 images from the year 2016) were used in our analysis. Three steps were performed to pre-process the Landsat images. First, the pixel_qa band of the Landsat image was used to mask the cloud, cloud shadow, and SLC-off (Landsat 7 images) pixels. Second, the blue, green, red, near infrared, and two shortwave infrared bands of Landsat images were selected for further processing. Third, the NDVI (Tucker 1979), normalized difference built-up index (Zha et al 2003), and modified normalized difference water index (Xu 2006) were calculated with the selected original bands of Landsat images to enhance the information on vegetation, impervious surface, and water.
To deal with the inconsistent coverage caused by cloud contamination and utilize the seasonal signal in classification, we used the greenest pixel compositing method (Trianni et al 2015) and the percentilebased image compositing method (Hansen et al 2013) to create composited images for further analysis. The greenest pixel compositing method uses pixel values from the image in which the NDVI value was the highest among all available images to create a new composited image. The created composited images can reflect the status of ground cover at the peak growing season (Trianni et al 2015), which is ideal for separating vegetation from non-vegetation and giving the maximum extents of vegetation cover. The percentile-based image compositing method first ranks all available pixel values for each band and then uses percentile values to create a new composited image. The percentile-based composited images can capture phenology information without any explicit assumption and prior knowledge of the timing of phenology (Azzari and Lobell, 2017); thus it will be useful to differentiate vegetation types.
To classify the composited images, training samples were collected by using a pooling approach. We divided the 1039 cities into 13 city groups following 'urban ecoregions' proposed by Schneider et al (2010) and collected training samples for each city group. For each city group, representative training samples for six land cover classes (including water, impervious surface, bare soil, tree/shrub, grass, and crop) were collected by referring to the highresolution images on GE. About 500-3000 sites or polygons were collected for each land cover class in each city group. In total, 159 936 sites or polygons were collected.
Land cover classification for each city was performed using the following procedures. First, the percentile-based composited Landsat images were classified into water and non-water classes with the training samples and random forests classifier (Breiman 2001). Then the non-water classes were further classified into vegetation and non-vegetation classes based on the greenest pixel composited Landsat images. The non-vegetation classes were classified into impervious surface and bare soil classes. Using the percentile-based composited Landsat images, the vegetation classes were further classified into tree/shrub, grass and crop. Finally, six class land cover maps of each city in 2015 can be generated by integrating the above-mentioned results. Accuracy assessment of the land cover maps was performed by extracting 100 pixels as validation samples from each classified land cover map using a stratified random sampling method (Olofsson et al 2014). The strata here were the land cover classes and the number of pixels sampled for each stratum was proportional to the proportion of land cover class to the total area (Huang et al 2017). The validation samples were then interpreted visually by referencing highresolution images on GE. The confusion matrix of land cover classification was generated according to the results of validation samples interpretation and classification and the overall accuracy and kappa coefficient were calculated to assess the accuracy of the land cover classification (Congalton 1991).
The UGS map of each city was generated by extracting out the tree/shrub cover and the grass cover within the derived boundary of built-up area from land cover maps. The tree/shrub cover and the grass cover classes were merged as UGS for each city.

Urban green space coverage and accessibility calculation
Using the UGS dataset, we analyzed the pattern of UGS for cities across the globe using two indicators: the urban green space coverage (UGSC) and urban green space accessibility (UGSA). The UGSC of each city was calculated as: where UGS area is the area of UGS, and BUA area is the area of continuous built-up areas. The UGSA of each city was calculated as follows (Annerstedt Van Den Bosch et al 2016, World Health Organization 2016): where N ACC is the sum of urban inhabitants that live nearby (in 300 m linear distance) to UGS that have a size of 0.5 hm 2 and over, and N TOTAL is the total number of urban inhabitants in the continuous built-up areas. Urban inhabitants were derived from the global population grid dataset (Freire et al 2016).
Using the derived urban boundaries in section 2.  al 2017), respectively. The classification accuracies of urban land covers within the four products were examined using the same set of validation samples. Green spaces-related land cover types such as forests and grasslands of these four products were extracted out and merged as UGS. The UGSC for these four products was then calculated using the equation (1). The values were compared to the value derived from the UGS map generated in this study. The significance of difference was tested using the Kruskal-Wallis test method since data was not normally distributed (Shapiro-Wilk test, P < 0.001) and post-hoc tests for all-pairwise comparisons were performed.

Urban green space spatial equity and accessibility analysis
Spatial patterns of UGSC and UGSA among cities were analyzed by the biomes where the cities are located and the income level of the countries. The biomes include arid semi-arid desert, boreal forest, Mediterranean, temperate forest, temperate grassland, tropical subtropical forest, and tropical subtropical grassland (Olson et al 2001). The income levels of countries include high income, upper middle income, lower middle income, and low income as classified by the World Bank (www.worldbank.org/). To test the significance of differences of UGSC and UGSA of cities, the Kruskal-Wallis test was performed since data were not normally distributed (Shapiro-Wilk test, P < 0.001), followed by post-hoc tests for all-pairwise comparisons. These tests were conducted using the kruskal.test and posthoc.kruskal.nemenyi.test functions in the PMCMR package (Pohlert 2014) in R (version 3.3.1; R Development Core Team, Vienna, Austria).

Mapping result of urban green space
We generated urban land cover maps for the 1039 large-and medium-sized global cities (supplementary data 1, supplementary data 3). The mean overall land cover classification accuracy was 89.26% (SD = 3.26%) (figure 3, table 1), which met the accuracy level for land cover analysis, i.e. 85% (Thomlinson et al 1999). The mean value of Kappa coefficient was 0.84 (SD = 0.06) ( figure 3, table 1). The overall land cover accuracy in urban areas of the four global land cover products ranged from 53.23% to 65.79% and the Kappa coefficients ranged from 0.43 to 0.58 (table S1-S4). The confusion matrix of classification for all cities is shown below (table 1) while the confusion matrix of classification for each city can be found in supplementary data 4.
With the classified land cover maps, we generated UGS maps of the 1039 cities (supplementary data 3). London, UK and Beijing, China were used as examples to show the difference in UGS maps generated from our land cover maps and the four existing global land cover products (figures 4 and S2). The result shows that many small UGS patches were correctly identified by using our method while missed in  the global land cover products (figure S2). The mean UGSC of the 1039 global cities estimated using our data and the four existing global land cover products differed significantly (table 2).

The global pattern of urban green space coverage and accessibility in 2015
The mean UGSC and UGSA in continuous built-up areas of the 1039 large-and medium-sized global cities in 2015 were 38.46% (SD = 20.27%) and 82.67% (SD = 22.89%), respectively (supplementary data 1). Figure 5 showed the global pattern of UGSC and UGSA in the 1039 cities. Most cities in Europe and the Eastern US had relatively high levels of UGSC (>60%), while cities in Western Asia and Northern Africa had a low level of UGSC (<20%). Cities in Western Asia and Northern Africa had a low level of UGSA (<30%), while cities in Europe and the Eastern US had a relatively high level of UGSA (>80%). The mean UGSC (57.91 ± 10.72%) and UGSA (99.34 ± 0.89%) of cities in the boreal forest were the highest, while the mean UGSC (14.18 ± 15.58%) and UGSA (46.77 ± 31.28%) of cities in the arid semi-arid desert were the lowest. Overall, the mean UGSC and UGSA of cities in forest biomes were higher than cities in grassland biomes and the mean UGSC and UGSA of cities in grassland biomes were higher than cities in the desert biome. The mean UGSC (49.60 ± 24.48%) and UGSA (87.78 ± 22.91%) of cities in high-income countries were the highest, while the mean UGSC (32.99 ± 15.31%) and UGSA (80.68 ± 22.15%) of cities in lower-middle-income countries were the lowest. To consider both biome types and income levels of countries, the mean UGSC (66.74 ± 2.89%) and UGSA (99.37 ± 0.59%) of cities in high-income countries and boreal forest biome were the highest, while the mean UGSC (1.40 ± 0.89%) and UGSA (16.49 ± 10.88%) of cities in high-income countries and arid semi-arid desert biome were the lowest. For the same biome types, cities in high-income countries tend to have larger mean UGSC and UGSA than cities in other income levels of countries (figure 6).

Mapping urban green space at the global scale
With the mapping approach developed in this study, we generated UGS maps of all 1039 large-and  medium-sized global cities in the year 2015 with a mean overall classification accuracy of 89.26% (SD = 3.26%). Compared with UGS maps derived from the four existing global land cover products, our product delineated the UGS in these cities more accurately. The difference may be attributed to the fact that global land cover products were aimed to map land covers for the whole earth's lands but not specifically for the urban areas. Land cover mapping in urban areas needs specific attention because urban landscapes are usually very heterogeneous (Zhou et al 2014). Besides, global land cover products usually use , which can be inadequate for accurately characterizing land covers in urban settings. For example, the minimum mapping unit of GlobeLand30 and ESA-CCI-LC is 9 hm 2 while it is 25 hm 2 for MODIS-LC product. Small (2003) showed that the characteristic length of urban features is between 10 and 20 m. Many small-and medium-sized green spaces would be omitted at these coarse resolutions. However, small-and medium-sized green spaces often form the bulk of UGS. For instance, in Sheffield, UK, 50% of UGS were patches less than 0.59 hm 2 (Fuller et al 2010). We obtained better results because we focused only on urban areas and used a classification approach that tailored to urban areas. For these reasons, our estimates of UGS are also much higher than that of the four global land cover products. Compared to UGS mapping work at the city or regional scales, our mapping approaches have some unique advantages. First, most studies used satellite images at a single date to run land cover classification and extract UGS thereafter (Miller 2012, Haase et al 2019. However, it is challenging to differentiate vegetation types using single-date satellite images due to the spectral similarities of different vegetation types (Liu and Yang 2013). Previous studies have shown that cropland was easily confused with other vegetation (e.g. trees and grasses) in urban land cover classification especially using a mono-period satellite image (Zhang et al 2019). Thus, the percentile-based compositing approach which combines information of multiple images in a year should be an important improvement for differentiating vegetation types. Second, vegetation coverage usually varied differently with seasonal changes (Wei et al 2018), thus mapping UGS with different dates of satellite images may cause varied extents of UGS. Here we mapped the maximum extents of UGS by using the greenest pixel composited images which reflect the peak growing status in a year; thus our mapping results are consistent and more comparable among different cities. Lastly, there are several studies that also focused on UGS at a global scale (Kuang 2019, Corbane et al 2020, Richards andBelcher 2020). However, these large scale studies either used NDVI as an indicator of urban vegetation cover (Corbane et al 2020) or did not differentiate different vegetation types (Kuang 2019, Richards and Belcher 2020). Their results mix UGS (i.e. trees/shrubs and grasslands) with other vegetation types (e.g. cropland). In this study, we have developed a new framework for mapping the maximum extent of UGS in global cities. The combined use of different satellite image compositing methods contributed to more accurate delineation of green spaces in urban areas. Also, the use of the adjusted threshold method on night lights data allowed for deriving boundaries of built-up areas in a consistent manner. Therefore, UGS maps generated with our approach gave a more accurate description of the baseline situation of UGS in global cities in 2015 and can be very useful for urban ecology and sustainability studies.

Global patterns of urban green space coverage and accessibility
With the derived UGS maps, we analyzed global patterns of UGSC and UGSA in the 1039 large-and medium-sized global cities. We found that cities in high-income countries had higher UGSC and UGSA values than cities in low to lower-middle income levels countries. This finding is consistent with previous studies (Kuang 2019). Kuang (2019) also found that UGSC in cities of developed countries such as the United States and the United Kingdom were higher than that of cities of less developed countries. This pattern indicates that economic status (e.g. per capita GDP) is an important factor that affects the global pattern of UGSC and UGSA. Previous regional (e.g. country or sub-continent) studies (Fuller and Gaston 2009, Zhao et al 2013, Richards et al 2017 have shown that UGSC was positively correlated with per capita GDP. We also quantified patterns of UGSC and UGSA in the 1039 large-and medium-sized global cities across biomes. We found that the mean UGSC and UGSA of cities in forest biomes were higher than cities in grassland biomes and cities in the arid semi-arid desert biome. This pattern reflects the dominant role of climate conditions on UGS of cities in different biomes. The finding is in line with the general theory that climate controls the distribution of major vegetation types of the world (Woodward 1987). A similar pattern was found in a regional study by Nowak et al (1996). They found that cities within forested areas had higher percent tree cover values than cities in grasslands, and the percent tree cover of cities in grasslands was higher than cities located in deserts in the US.
To consider both biome types and income levels of countries, we found that cities in richer countries tend to have larger UGSC and UGSA for cities in poor countries in the same biome type. This shows that economic status plays an important role in determine UGSC and UGSA for cities in similar climate conditions. The 'luxury effect' hypothesis (Zhao et al 2013, Richards et al 2017, i.e. cities with more economic powers can invest more in UGS, might explain this difference. It is a challenging task for cities in less developed economies when trying to improve their UGSC and UGSA, which contributes to environmental injustice (Wolch et al 2014).
Our findings contribute to novel global-scale UGS mapping methods and provide key baseline information for UGS planning and management towards urban sustainability at the regional to global scale. The wide range of UGSC and UGSA among cities indicates that there is room for improving the UGSC and UGSA of global cities. However, the international organizations, central governments, and municipalities must hold a realistic view on this goal because the abundance of UGS around the globe is determined by both natural and socioeconomic conditions. Actions can be taken to increase the UGSC of cities where climatic and socioeconomic conditions are favorable. At the same time, it may not be sustainable to push for higher UGSC in cities facing challenging climatic and hydrological conditions (e.g. low precipitation, high drought risk, groundwater depletion) even when financially feasible. To push for higher coverage in cities where climatic conditions are favorable but financial resources do not exist may be equally unsustainable. If a substantial increase of UGS is not an option for a city, the focus should be put on improving the access of existing UGS and distributing any new UGS strategically to maximize accessibility. Strategies such as making cities 'just green enough' can be adopted to increase the supply of UGS and protect social and ecological sustainability (Wolch et al 2014).

Limitations of this study
While our study provides new and detailed information on maps of UGS and global patterns of UGSC and UGSA in global cities, limitations of the current study should be noted. First, the 30 m resolution of satellite image used in this study might still be too coarse for mapping small UGS patches (Qian et al 2015. Future studies may consider using finer resolution satellite images to map small UGS patches better. Second, we mapped UGS of large-and medium-sized cities but not including small-sized cities. Although these large-and mediumsized cities accounted for half of the world's urban population and probably a significant portion of all UGS, the inclusion of small-sized cities in future studies can provide a more complete picture. Third, it should be noted that the GHSL data may be biased in some areas which could bring uncertainties. Finally, despite an enhanced ability to differentiate among vegetation types through the percentile-based composited images, there are still remaining misclassifications that could be investigated more closely in the future. For example, future studies may further improve classification performances by incorporating more relevant predictors (e.g. the texture metrics of images) in the urban land cover classification process.

Conclusions
In this study, we developed a protocol to map UGS of all 1039 mid-and large-sized cities across the globe in 2015 with dense Landsat remote sensing data and the GEE platform and analyzed global patterns of UGSC and UGSA in these cities. We mapped the maximum extent of UGS in the 1039 cities and our product delineated the UGS in these cities more accurately comparing with four existing global land cover products. We found that cities in high-income countries had higher UGSC and UGSA values than cities in low to lower-middle income levels countries and that the mean UGSC and UGSA of cities in forest biomes were higher than cities in grassland biomes, and the mean UGSC and UGSA of cities in grassland biomes were higher than cities in arid semi-arid desert biomes. Our study results provide key baseline information to support international endeavors to fulfill the relevant urban sustainable development goals, and the UGS maps we generated could be useful to regional to global scale UGS planning and management.

Data availability statement
All data that support the findings of this study are included within the article (and any supplementary files).
(Grant No. 2019YFA0607201), the Tsinghua-Berkeley Research Collaboration grant from Tsinghua University, the Tsinghua University Initiative Scientific Research Program (Grant No. 20203080018) and the National Key Scientific and Technological Infrastructure project 'Earth System Science Numerical Simulator Facility' (EarthLab).