University of Birmingham Assessing urban greenery by harvesting street view data

Urban greenery is of great significance for sustainable urban development due to the diverse ecosystem services it provides. Assessing urban greenery can reveal its impact on urban areas and provide the evidence base for strategic urban forest management and planning, thereby contributing to sustainable urban development. Street View (SV) images are being used more frequently and widely for assessing urban greenery due to the advantages of providing new perspective and saving workload and research costs. In this paper, 135 peer-reviewed publications that employed SV to assess urban greenery between 2010 and 2022 are reviewed. Presently, the most widely applied area of SV-based urban greenery research is to extract the green view index. Although this has many potential applications for assessing ecosystem services, it has most often been used to date to identify the impact of street greenery on residents ’ physical and mental health, activities, and well-being (i


Introduction
Developing sustainable cities and communities is one of the 17 sustainable development goals proposed by the United Nations (UN, 2022), aiming to make urban areas more inclusive, safe, sustainable, and resilient. Urban areas need to have the ability to cope with disasters and climate change and minimise adverse impacts on the natural environment. Urban greenery is indispensable in developing sustainable cities due to its various Ecosystem Services (ES). In recent years, a growing number of studies have explored urban greenery's role in sustainable urban development. For example, Mell (2009) used London as an example to discuss the positive effect of green infrastructure on urban regeneration and sustainable development from ecological, economic, and social perspectives. He highlighted how green infrastructures create liveable spaces, enhance social well-being and cohesion, and increase the city's ability to control climate and manage water. Likewise, Voghera and Giudice (2019) reviewed two study cases in France and Italy to illustrate the contribution of green infrastructure to regional sustainability and resilience, including governing climate change, improving landscape quality, and increasing well-being.
To better understand the contribution of greenery to urban sustainability and to manage green infrastructure more effectively, assessments of urban greenery are increasingly necessary. Traditional methods for assessing urban greenery range from fieldwork for tree inventories , to exploring the normalized difference vegetation index (NDVI) -which uses remote sensing images to explore the density of green, leaf area index (LAI), or leaf area density (LAD) estimated by collecting fallen leaves or hemispheric photography (Nichol and Lee, 2005;Wei et al., 2020). In recent years, many new and innovative technologies have since been developed to facilitate this assessment. Street View (SV) is one notable innovation and has established itself as a widely used urban greenery assessing tool. This article aims to critically review the application of SV in urban greenery assessment, the methods commonly employed, as well as discussing the limitations, and application prospects of this technology.

Urban greenery
Urban greenery, i.e., vegetation in urban areas, includes street greenery (street trees, shrubs, grasslands), parks, gardens, and forests (Konijnendijk et al., 2006). Architectural components covered with vegetation that have become popular in recent years, such as green roofs and living walls, also belong to urban greenery (Liberalesso et al., 2020). These elements compose green infrastructure networks in urban areas (European Commission, 2012).
Urban greenery provides a range of ES, i.e., non-material benefits people obtained from urban greenery (FAO, 2022), for urban residents and the urban environment. Cultural services provided by urban greenery, include the promotion of residents' activities (Yang et al., 2020), reduction of the morbidity of multiple physical diseases like obesity and heart diseases (Tsai et al., 2019;Nguyen et al., 2021), relief of anxiety and depression Olafsdottir et al., 2020) and so on. Regulating services consist of carbon sequestration (Strohbach et al., 2012;Nowak et al., 2013), adjustment of urban micro-climate and mitigating urban heat island (Loughner et al., 2012), stormwater management (Berland and Hopton, 2014) and impact on air pollutants (Fujii et al., 2005;Dela Cruz et al., 2014;Viecco et al., 2018). In particular, urban greenery can influence the dispersion of air pollutants and thereby extend the distance between pollution source and human receptor, in order to reduce exposure (Ferranti et al., 2019;Pearce et al. 2021). Urban greenery also offers supporting services, for example providing wildlife habitats and supporting urban biodiversity (Shackleton, 2016).
Urban greenery assessments aim to understand the quantity, quality, and distribution of the greenery, and explore and quantify the ES it provides. Urban greenery can be assessed using various approaches. For example, NDVI quantifies the differences between near-infrared and visible red pixels in multispectral remote sensing images to estimate the plot's vegetation density or vegetation health status. It is one of the most used methods in urban greenery research (Pettorelli et al., 2005). LAI and LAD are important parameters to measure the energy and mass exchange between vegetation and the atmosphere (Yan et al., 2019). Tree inventories are a type of asset register and are used to record information pertaining to urban trees, including quantity, location, species, diameter at breast height (DBH), and tree height. Researchers can further explore the benefits and ES of urban greenery from the assessment results. There are also a variety of methods used for the ES assessment and it is not unusual to combine approaches such as greenery indicators and other information related to benefits for correlation and regression analysis (Hillsdon et al., 2006). Other methods include exploring the environmental benefits of urban forests using tree inventories and data of trees (Riondato et al., 2020) and modelling the impact of greenery on urban climate (Shashua- Bar and Hoffman, 2000).

Street view (SV)
SV images are photographs (including panorama photographs) taken from street level. They contain location information and can be mapped to create interactive and immersive landscapes (Alvarez León and Quinn, 2019). Researchers can bulk download these images through API interfaces for deeper analysis (Li, 2021).
At present, Google is the largest provider of SV in the world. Googleowned SV images have covered more than 10 million miles of roads in 83 countries in 2017, and this number is continually increasing (Raman, 2017). Similar SV providers include Microsoft, Tencent or Baidu. Services of the latter two are mainly provided in China as substitutes for Google in this country (Cheng et al., 2017). Moreover, crowdsourcing SV platforms that use photos with geographic coordinates taken and uploaded by ordinary users as image sources (e.g., Mapillary and OpenStreetCam) are also rapidly developing (Mahabir et al., 2020).
Most SV images are taken by vehicle-mounted cameras. Typical SV collection equipment contains sub-cameras facing multiple orientations for stitching panorama pictures installed on the top of vehicles, slightly higher than the height of the human eye (≈2.5 m) (Anguelov et al., 2010;Ringland et al., 2019). Similar components can also be used on tricycles, sleds, or backpacks to adapt to terrain that cars cannot reach (Tung, 2018) Fig. 1.
SV allows the assessments of the urban environment from land and streets, and is an approach has unique advantages over aerial / satellite imagery. Firstly, results of urban environmental assessments using SV are also closer to human perception (Yang et al., 2009;Leslie et al., 2010). Secondly, SV images cover the vertical dimensions of the urbanscape . Thirdly, SV images contain detailed information of streets, especially small-size geographical objects (e.g., billboards and traffic signs) (Balali et al., 2015;Egli et al., 2019). Finally, the technology allows virtual assessing and automatic data collection. Compared with some traditional urban greenery assessing methods such as fieldworks for tree inventory creation and management, it reduces research time and workloads, and increases accessibility for users can study the urban greenery without having to physically visit the field site. Due to the above advantages, it is evident that studies of using SV as a tool for urban environmental research are on the increase (Biljecki and Ito, 2021). The application of SV covers a broad range of disciplines ranging from social and economic aspects (e.g. health research (Rzotkiewicz et al., 2018;Kang et al., 2020), built environment, and land use assessment (He and Li, 2021) and urban gentrification research (Ilic et al., 2019)). It also covers environmental aspects, like research on the urban natural environment and green infrastructure that this article will focus on (Biljecki and Ito, 2021;Cinnamon and Jahiu, 2021).
This review focuses on how urban greenery has been assessed using SV as an innovative method of urban green infrastructure research. It considers new types of SV-derived information that cannot be obtained from traditional aerial and satellite photos, in particular information related to human perspectives such as Green View Index (GVI) and Sky View Factor (SVF)).

Literature searching and screening
In order to systematically search for articles that used SV images in urban greenery research, this article refers to the PRISMA Statement proposed by Moher et al. (2009) (Fig. 2). We collected and screened literature using keyword searches on both the Web of Science and Google Scholar,. The following keywords were used for the searching: 'street view' or 'street image' or 'street imagery' or 'street-level image' or 'street-level imagery' or 'panoramic image' and 'greenery' or 'greening' or 'greenness' or 'green space' or 'greenspace' or 'trees' or 'vegetation'. After removing duplicates from the search results using the combinations of above keywords, 611 articles were listed as the preliminary search results. All these articles were published before April 2022.
Then, articles were screened for inclusion in this review by using the following criteria: 1) Research articles published in English-language scientific journals. 2) Articles that had concluded the peer-review process.
3) Articles of empirical research 4) Articles where SV images were used as a tool (or one of the tools) to assess or map urban greenery or further explore ES of urban greenery. SV data sources were limited to open source/commercial SV images (e.g. Google), while articles using street images captured by researchers themselves were excluded. Articles that use SV as one but not the main method for data collection and analysis, are included in our review.
Following these criteria, 135 articles were reviewed.

Data analysis
Basic information for each article was extracted, including year of publication, study location, urban greenery type, the method of urban greenery data collection, whether ES of urban greenery were researched, and if so, what kind of ES was explored. A qualitative review was also  conducted to ascertain the key findings and limitations of SV discussed within the articles.

Overview
SV is a new but increasingly popular tool for urban greenery research. The 135 articles included in this review were all published between 2010 and April 2022. Between 2010 and 2016, the total number of published articles on urban greenery research using SV was only 8. However, the number of publications increased rapidly after 2017. In the five-year period from 2017 to 2021, the annual number of published articles was 6, 16, 18, 25, and 50 respectively (Fig. 3). Fig. 4 shows that the spatial distribution of studies on urban greenery using SV appears in clusters. Most studies were conducted in North America, East Asia (e.g., China, Japan, and South Korea), and Europe. Several Southeast Asian, South American, and Australian cities also have multiple research cases. In the Google SV coverage layer of this figure, 'Covered' represents countries that Google-owned SV images covers most of cities, towns, and highways in the country, except limited remote regions; 'Partly covered' represents countries that Google-owned SV images covers limited cities, towns, and highways, and 'Not covered' represents countries without Google-owned SV images. The situation of uneven distribution of cities SV-based research conducted may relate to the spatial availability of SV images; SV-based urban greenery research is generally conducted where SV (represented by Google) coverage is high. China is a notable exception due to the limitation of accessing Google services. However, several local companies in this country do offer comparable SV services.
GVI is the most common urban greenery metric extracted from SV images. About 82% of all reviewed articles involved quantitative studies of GVI, followed next by object identification (such as identification and location of trees) which account for 19% of all reviewed articles (Fig. 5).
Deep learning based on neural networks is the most common method of extracting information from SV images, used in more than half of the reviewed articles. Since its first appearance in 2017, the number of cases using this technique has grown rapidly in the following four years (4 in 2018, 11 in 2019, 20 in 2020, and 37 in 2021). Indeed, deep learning has become the most used method of extracting GVI and SVF in recent years, with some research also uses it to identify and locate objects such as trees Laumer et al., 2020;Lumnitz et al., 2021).

Information extraction and data collection
This section reviews how researchers have collected information and data about urban greenery from SV images. Based on the principles of information extraction and the types of extracted data, approaches can be divided into two categories: pixel-based classification of SV images and geographical objects (mainly trees) identification (Fig. 6).

Pixel-based classification
SV raster images consist of pixels recording information of intensity of light in different wavelengths (red, green, and blue). Pixels reflecting various features (vegetation, buildings, sky, etc.) can be distinguished by differences in spectral information, similar to 'geostatistical classification' for aerial and satellite images (Atkinson and Lewis, 2000). Aoki et al. (1985), GVI is the most widely used metric for assessing urban greenery by SV. GVI is defined as the proportion of the visible green part of the entire field of view (i.e., the visibility of urban greenery) on a ground position (Yang et al., 2009). In Aoki et al., (1985), researchers measured the green colour proportion in street photos with a 28 mm focal distance. They found that this green view level had a high correlation with the human perception obtained from respondents' on-site evaluation. Importantly, this means street-level photographs can accurately quantify greenness in urban areas from human perspectives. More recent research compared GVI with NDVI (Tong et al., 2020), which shows that GVI focuses more on areas near streets (i.e., with high accessibility of residents) and can better reflect the actual green exposure status of residents (Ye et al., 2019). It is concluded that GVI is a better indicator of greenery particularly when correlating the relationship between greening and residents' health or activities (Villeneuve et al., 2018;Yu et al., 2021). These studies embody the unique advantages of GVI as a new indicator in urban natural environment research.

Green View Index (GVI). Originally proposed by
SV is a commonly used data source for GVI due to its approximatehuman-eye perspective. GVI calculated from SV is based on the proportion of pixels representing vegetation in a panorama SV image. However, there will be distortion when panorama images are flattened into planes. Therefore, in actual operation, non-panorama photos of four directions (e.g., true north, east, south, and west, or directions perpendicular and parallel to the roads), instead of the entire panorama photo, will be selected for pixel extraction (Yang et al., 2009;Long and Liu, 2017;Lu, 2019). In some improved research cases, the number of directions is even higher, such as six (Dong et al., 2018), ten (Xia et al., 2021a), and three vertical angles (upper, middle, and lower) in 6 directions (18 directions in total) (Li et al., 2015).
Therefore, the extraction of green pixels from vector images is a critical step of GVI calculation. Commonly used extraction methods can be divided into three categories (Table 1). Manual screening is the most traditional method, which screens green pixels by using the 'magic wand' tool of Photoshop (or similar) (Yang et al., 2009). Unsupervised classification is a common GVI extraction method in literature published before 2018. For example, Li et al. (2015) proposed an unsupervised classification method, which extracts the three bands (red, green and blue) of images to identify the pixels that the green colour dominates. Using MATLAB, some other studies conduct unsupervised classification based on the HSV (i.e., hue, saturation, value) colour model, which extracts pixels with the hue level between 60 and 180 (Long and Liu, 2017) or between 75 and 170 (Dong et al., 2018). Finally, supervised learning has been increasingly widely used in recent years, which specifically includes support vector machine, backpropagation, and the more popular semantic image segmentation (including PSPNet, SegNet, and DeepLab) ( Table 1). Among them, deep learning based on semantic segmentation is the most used method to extract GVI from SV images, especially after 2019. 71 articles (68 after 2019) used this method to extract GVI, underlining the development of deep learning technology as an important factor in the development of SV-based research.
Following green pixel extraction, researchers can quantify and map the distribution of GVI in urban areas. A high-profile example of these approaches is 'Treepedia' which visualized the spatial distribution of GVI in 30 cities around the world (Abbati, 2019). Moreover, the outputted GVI data and maps has also led to further research. For example, comparative studies were carried out to analyze the difference in GVI distribution in different cities, and explore the influencing factors of GVI (Long and Liu, 2017;Xiao et al., 2021). The results show that landscape patterns may be an important driving factor. More essentially, local policy can influence the distribution of urban greenery. Li (2021) combined GVI and census data, whose result indicates the existence of green inequality in New York where populations of minorities usually live in communities with less greenery. Another example in Singapore embodies the role of GVI assessment in urban planning, which helps determine the priority of urban greening interventions of each street (Ye et al., 2019). (SVF). SVF measures the proportion of the field of view of the sky at a ground position (Liang et al., 2017). SVF is one of the dominant factors affecting urban microclimate. A lower SVF usually means a greater height/width ration within the street canyon and therefore less available solar radiation, which helps reduce daytime street temperatures and improve residents' thermal comfort (Sanusi et al., 2016). Although not a direct parameter of urban greenery, green elements, especially tree crown will form significant components of the  non-sky hemisphere. Measurements of SVF are often obtained from fish-eye photography, including SV-based approaches (Al-Sudani et al., 2017;Miao et al., 2020). SV-based SVF measuring is an extension of fish-eye photography methods, but it has a unique, and abundant, means of data collection and analysis. Similar to fisheye imagery, SV images can cover the entire field of view above the point of capture. Other approaches include the use of 3D vector database, or Digital elevation model (DEM) analysed via ray-tracing algorithms in a GIS (Gal et al., 2009;Kastendeuch, 2013). Both methods are comparable, but some studies found that the result of the photography method to be lower, and the difference between the result of the two methods are positively correlated with tree coverage (Gong et al., 2018;. The fundamental reason for this difference is that the models frequently used by computational methods only contain building data but not trees. Thus, some researchers have defined the difference between the SVF measured by photographic and computational methods as a quantitative value for tree shading services Li et al., 2021).

Sky View Factor
Extracting sky pixels from SV images is a critical step in SVF measuring. In true colour SV images, sky pixels have unique properties (high lightness and/or blue hue). Researchers developed different methods for identifying and extracting sky pixels based on these properties (Table 2). Like GVI measuring, machine learning and classification (especially deep learning) is applied in most of the research cases we reviewed because of their advantages of high automatic level and accuracy.

Object-based identification
Object-based assessment is another way of urban greenery assessment by using SV images. SV images are widely used for manual or automatic identification of street furniture (telegraph poles, traffic signals, signboards, etc.) in cities Campbell et al., 2019;Toaha et al., 2020). In urban greenery studies, this method is mostly used for tree identification and data collection, while other types of green infrastructures, such as gardens and parks, can also be assessed by SV.

Virtual survey: tree inventories.
In urban tree research, a widely Table 1 Comparing of SV-based GVI measuring methods. This section focuses on the methods of data extraction, so the typical cases that appear in this table are not limited to articles included in the review.

Typical cases Advantages Disadvantages
Manual screening Yang et al. (2009) • High accuracy because of manual operation. • Can be used for verifying the results of machine learning.
• Labour-intensive work, which needs a significant amount of time and workforce.
Unsupervised classification RGB-based Li et al. (2015) • Has a simple operation principle and is easy to operate. • Research time can be saved.
• Classification accuracy is easy to be affected by external factors (light, weather situation, etc.Non-vegetation green pixels (e.g., green walls and traffic signs) are easy to be mis-identified HSV-based Long and Liu,   • High accuracy (when using enough samples for machine learning) • Has the highest automatic level, and suitable for big data analysis • Requiring computer and coding knowledge of researchers • Large numbers of training samples needed used method is generating tree inventories, which count the number of urban trees and collect specific information about these trees (Morgenroth and Ö stberg, 2017). However, compiling and regularly updating tree inventories is labour-intensive. Although some researchers have introduced photographic methods (e.g., laser scanning) to reduce the workload of tree information collection (Calders et al., 2015;2020), the research costs are high. SV has the advantages of wide spatial coverage, frequent updates (especially in major cities in the United States and Europe), and low cost.
The tree parameter data collected by SV-based tree surveys is similar to that collected by traditional on-site surveys. In most cases the DBH and species (genus or species level) were measured or identified. Other information that can be collected includes the height and crown size of trees (Wang et al., 2018;Meunpong et al., 2019;Ulus et al., 2021), health situation, etc. (Berland et al., 2019). Data are collected by visual interpretation, generally by professionals or trained personnel to maximize accuracy. Moreover, Wang et al. (2018) tried to use standardised meters, such as fixed-width lanes and traffic lines, road curbs, and limewhite on trees (lime water brushed on tree trunks with a height of approximately 1.2 m, commonly used on street trees in Chinese cities to prevent pests and frost damage), to improve identification accuracy further. However, only the limewhite approach achieved a satisfactory result. Moreover, some manual identification results (e.g., species) can be used as training data for machine learning. With enough training data, fine-grained object recognition can be used to identify tree species based on the characteristics of different tree species, making automated SV image interpretation possible (Lumnitz et al., 2021). Studies that use machine learning to identify trees are reviewed in detail in the next section. Table 3 summarises the results of four case studies that use SV-based manual surveys to collect urban tree data to assess the performance of this method in practical application. All these studies use the on-site survey results as standards to compare with the SV assessing results. A horizontal comparison is not straightforward because the parameters that measure the accuracy or precision of the same tree data used in different research cases are not consistent. Indeed, Table 3 shows that the identification accuracy or precision of various parameters of trees varies significantly in different research cases. According to these cases, the performance of SV for manual identification of tree species is related to the urban tree species structure. Specifically, urban areas that contain tree species that have a simple structure have higher identification accuracy, and certain species have significantly higher identification accuracy than others (Berland et al., 2019). The precision of tree size data collection is related to the data collectors' expertise level (Berland and Lange, 2017) and whether appropriate meters are used (Wang et al., 2018).
Apart from urban trees, SV-based virtual surveys have also been applied in assessments of other types of urban greenery. For example, elements related to urban greenery and plants, such as animal nests, are also objects that can be identified by SV (Rousselet et al., 2013). Some studies use SV for community and neighbourhood natural environment scoring, as a part of urban nature environment assessment (Clarke et al., 2010;Wu et al., 2014). In these studies, virtual and on-site surveys showed high agreement on the rating items related to urban greenery, suggesting that virtual surveys can accurately conduct this type of assessment.

Deep learning for automatic identification of objects.
Although SV-based urban greenery manual identification saves research cost and time compared with on-site surveys, it still has a low automatic level, and regional-scale assessment is labour-intensive. Due to this disadvantage, some researchers explored the feasibility of identifying objects (mainly trees) by deep learning.
Convolutional Neural Networks (CNN) are the main technique used for automatic tree detection used in all the deep learning cases reviewed. CNN-based methods also include methods derived and improved from CNN, such as Faster R-CNN for high-efficiency analysis (Laumer et al., 2020), Mask R-CNN, which is modularized and with generality and flexibility (Lumnitz et al., 2021), and Part Attention Network for Tree Detection (PANTD) which can be used for detecting occluded trees . Based on machine learning with a significant number of images, CNN can automatically segment tree instances from SV images.
However, compared with manual virtual surveys, the scope of application of deep learning is in its infancy. At present, most studies only segmented the objects of trees to collect primary data such as tree amount and locations. From the results of these cases, the detecting rate of trees can generally exceed 70% (Wegner et al., 2016;Branson et al., 2018;Lumnitz et al., 2021), but the positioning accuracy in different studies varies significantly. For example, based on triangulation, Lumnitz et al. (2021) successfully located 93% of street trees in Vancouver, while the study of Laumer et al. (2020) correctly assigned coordinates for only 38% of street trees. The reasons for this difference, apart from study design, might be SV images properties (such as the distance between shooting locations and trees), urban environments, and tree properties (such as species structure). There are also some studies that focused on other tree information besides counting and localization. For example, Khan et al. (2021) used a Siamese Convolutional Neural Network (SCNN) to assess the health rate of eucalyptus; Branson et al. (2018) tried to identify the species of street tree and got an accuracy of more than 80%. This result is close to manual identification in the previous section, indicating that machine learning might have an identification ability comparable to humans. However, species identification accuracy is related to multiple factors like local tree species structure and SV image attributes, while there are currently only a few studies using deep learning to identify tree species. It means it cannot be sure whether deep learning can replace manual identification until more study cases are conducted in different cities.

Assessment of ecosystem services of urban greenery
Due to urban greenery's benefits and its promotion of sustainable urban development, assessing the ES of urban greenery is the major application of urban greenery assessment. Many study cases used urban greenery data extracted from SV images to assess three aspects of ES (regulating, culture, and supporting) of urban greenery. 80 articles were found that explored the various ES of urban greenery. Among these articles, the association between the quantity of urban greenery and residents' health and activity were explored the most (40 of 80 studies). The cooling benefits of trees were discussed in 10 articles, which is the second most discussed topic. This study qualitatively proved that areas with high tree density (tree view factor > 0.6) have relatively low MRT, but it did not quantify to what extent trees play a role in cooling. Li et al. (2022) evaluated the link between surface temperatures and the view indexes of three types of vegetation based on spatial regression. This qualitative research showed that the spatial distribution and mechanism of the cooling benefits of various urban greenery are different. For example, trees and grasses have the most obvious cooling effect in commercial areas and areas with dense roads and buildings, while shrubs have the greatest cooling effect in industrial areas.
Most SV-based urban thermal environment studies focus on SVF and discuss the reduction of solar radiation by trees, which is the major mechanism of trees' cooling effect in urban areas (Wang et al., 2016). For example, the studies of  and  showed that trees reduced 24.61% (whole urban area) and 18.52% (downtown) of SVF in Boston, while a research case in Harbin, China estimated an SVF reduction of 42.5% from trees (i.e., the average shade effectiveness of 56.3%). From these papers, it can be concluded that the impact of trees on street SVF varies widely across cities. The differences may come from factors such as the attributes of urban trees (e.g., species and size), roads (e.g., road width), and buildings. Moreover, Richards and Edwards (2017) proved that the reduction rate of solar radiation from urban trees in Singapore is approximately 8%. Apart from reducing solar radiation and SVF, a minor mechanism of urban greenery's cooling effect is transpiration (Wang et al., 2016). However, no SV-based cases that explore this topic were found.

Air pollution remediation.
Only two articles were found that used SV to explore the effects of urban greenery on air pollutants. Using regression, Wu et al. (2020) demonstrated negative associations between GVI and air pollution data (NO 2 , two size ranges of particulate matter, PM 2.5 and PM 10 , and air quality index) in three Chinese cities. The results show that all four air quality parameters are negatively correlated with GVI, which suggests there is a relationship between urban greenery and better air quality. Huang et al. (2022) explored the relationship between PM2.5, PM10 and urban morphological indicators including GVI and SVF, and developed predictive models. However, these studies did not specify whether the primary mechanism is direct pollutant removal by plants or aerodynamical impact, i.e. if the urban greenery is primarily influencing the dispersion of pollutants.

Supporting services
Urban greenery is an essential part of urban ecosystems, which provides habitat and food sources for animals. The quantity of urban greenery is a determining factor in the biodiversity level in urban areas. Using SV images, researchers can identify objects related to animals and habitats to assess the situation of urban biodiversity. For example, Rousselet et al. (2013) identified the nests of pine processionary moths and evaluated the quantity and distribution of this species.
On the other hand, invasive species may damage the fragile original ecosystem in urban areas and adversely affect local biodiversity. In current studies, SV images are also used for species identification of urban plants (including woody and herbaceous plants) (Ulus et al., 2021). This helps to understand the urban plant species structure, especially the proportion of invasive species, which contributes to determining the impact of invasive species on local ecosystems.

Health and activities.
This review has highlighted that the relationship between urban greenery and residents' health and activities is one of the most researched fields. The topics include the morbidity of physical diseases (e.g., obesity, asthma), mental health, traveling (walking time and propensity of walking or cycling), and indices related to physical activities like frequency and duration of exercise (jogging, cycling, etc. for leisure purposes).
The methods used by these studies are similar. Firstly, GVI was used as an indicator of urban greenery without exception. Secondly, almost all the cases used regression to explore the link between urban greenery and health or activities. There are only two exceptions: correlation analysis used by Zang et al. (2020) and structural equation modelling used by Wang et al. (2021), which explored the influencing mechanisms of urban greenery for mental health. Most of the result of these studies supported that urban greenery (using GVI as an indicator) was associated with residents' health and activities. Moreover, some studies compared SV-based assessing and NDVI, a traditional urban greenery method, which found that regression analysis results of the two ways were inconsistent. Associations are often easier to manifest in SV-based studies (Villeneuve et al., 2018;Helbich et al., 2019). It proved that SV-based assessment is closer to the actual situation of residents' green space exposure. Finally, Wang et al. (2021) discussed the mechanism of urban greenery affecting mental health. The mechanism by which the quantity of greenery affects mental health is reducing harm (e.g., pollution removal), while high-quality green space can improve residents' mental health through its restoring capacities (stress and life satisfaction) and building capacities (stress and life satisfaction).

Environmental perception.
Environmental perception refers to the feelings brought by environments, such as pleasure, happiness, and safety. Researchers can explore the role of urban greenery on perception by analysing the association between urban greenery data from SV and people's feelings. For example, using a multilevel regression model, Jing et al. (2021) proved the effect of GVI in reducing residents' fear of crime. SV images can also directly measure the human perception of urban environments. Using questionnaires that investigated people's feelings on SV images, Quercia et al. (2014) proved the link between the quantity of greenery and the perception of beautiful, quiet, and/or happiness. Similarly, from SV image-based visual perceptual scores, Cheng et al. (2017) argued the association between attributes of streets and people's feelings. The results showed that GVI has a positive association with perceptual scores, while the sky-openness index (a concept similar to SVF) is negatively associated with perceptual scores in the interval below 0.2. It shows the positive effect of urban greenery on improving human perception. On the other hand, a visual preference survey of cyclists shows that SV images with high-density trees tend to have lower preference levels because of their low visibility (Evans-Cowley and Akar, 2014). It means greenery may also be a hindrance to safety.

Current state of the art
Whilst urban greenery research based on SV does not have a long history, the development of deep learning techniques means the number of research articles in this field is increasing rapidly, especially in the past five years. Among these, the number of research exploring the ES of urban greenery increased from 2 in 2017-35 in 2020. This demonstrates the emergence and acceptance of SV as a tool to assess urban greenery for urban environmental research.

Research advances enabled by SV
Based on the reviewed articles, we found that due to its advantages, SV plays a unique new role in urban greenery research, helping topics to be explored that were previously challenging. We categorise this into three aspects: (1) The savings in research time, cost, and workload, (2) reflection on human's perception and (3) ability to observe vertical profiles of urban greenery.
The reduction of research time, cost, and workload make SV potentially more promising than traditional research methods, both in pixeland object-based studies. SV-based greenery measurement provides large-scale quantitative data, and this advance enables big-data analysis at the scale of cities or larger, for example across megacities or metropolitan conurbations. Such large-scale analysis of the individual components of urban greenery would not be possible using traditional methods, like the landscape simulation and visual character method introduced by Velarde et al. (2007) and Ode et al. (2009). SV also can be used to identify, count, and even measure objects, replacing relatively inefficient fieldwork (Wang et al., 2018;Berland et al., 2019).
We believe that the advancement of methodology, in particular, the maturity of deep learning technology, is a major factor that lead to the above advantage of SV. This factor is particularly applicable to those pixel-based studies. Reflected in the number of articles, deep-learningbased studies accounted for most of the articles we reviewed (ca. 53% of the 135 studies). Since 2018, the number of studies on this method has grown exponentially. Deep learning has multiple advantages over traditional data analysis methods (Tables 1, 2). In recent years, the development of more advanced neural networks and the building of richer training sets (such as ADE20K and Cityscapes) have improved the accuracy of deep learning, allowing researchers to collect GVI and SVF data on a large scale rapidly.
Second, SV's new perspective reflects people's eye-level greening, which is difficult to achieve with traditional urban greenery research methods. Among them, GVI, the most common greenery index collected by SV, is especially close to the greenness perceived by people in daily life (Yang et al., 2009). In contrast, traditional methods like NDVI based on top-down perspectives focus more on the area of green space but not people's perceptions. For example, in the research case in Gothenburg, Sweden (Knez et al., 2018), it is difficult to notice the apparent difference in the green space area of the six sample areas by using remote sensing images. However, based on eye-level perspectives, obvious discrepancies are shown in the greenness perceived by residents.
The close relationship between SV and perception has led to many studies exploring the relationship between GVI and indicators related to health and well-being. When comparing SV with traditional methods, we found that SV showed a stronger connection between greenery and well-being than traditional remote sensing measurements in many studies (Villeneuve et al., 2018;Helbich et al., 2019;Wang et al., 2021), which validate that SV may be able to better reflect greenness exposure of residents. But a few studies have results that do not fully support this conclusion (Helbich et al., 2021). The differences between cities may be a reason for this phenomenon. Still, more importantly, the defects of SV itself, such as the limitations of observing parks, private gardens, etc., may also affect its performance (see Section 4.2).
In addition to measuring eye-level perception of greenness, SV can also be used to assess vertical profiles of urban greenery. This allows urban greenery to be considered alongside vertical profiles of other environmental parameters such as air quality and temperature, opening up exciting new areas for urban research such as understanding the role of trees in street canyons on solar radiation reduction and air pollutants dispersion. In particular, the effect of greenery on air quality has not been widely explored, and there is much potential for research in this area, related to the benefits and disbenefits of trees on air pollution concentrations.

Emerging applications
SV-based research plays an important role in sustainable urban planning, in particular assisting green infrastructure. For pixel-based data, many studies, like Dong et al. (2018), and Yu et al. (2019), use SV to identify those roads which have unsatisfactory levels of greenness at the human eye level. This helps prioritise planning decisions and green infrastructure investment. Moreover, these types of assessments can be combined with socioeconomic indicators in order to understand green inequality and environmental injustice (Li et al., 2015;Wang et al., 2021).
Similarly, as object-based data, trees can be mapped to support urban forest monitoring, management, and planning (Beery et al., 2022). Yang et al. (2009) stated that planners could design green infrastructure reasonably by choosing location, size, and species of plants to maximize the visibility of greenness. However, due to the limits of survey methods, practical applications in this area are still uncommon (See Section 4.1.3 for details).
In addition to the practical application of urban greenery data from SV itself, the research on urban greenery's ecosystem services also has been applied in urban planning field. Research on cultural services shows the ability of SV to perform in health-oriented urban planning. For example, SV have demonstrated a link between urban greenery and physical activities; as such information from SV can provide theoretical support and data reference for activity-friendly urban design (Anderson et al., 2017;Lu, 2019).
Another application field is urban climate regulation, which is also based on pixel-based data. Although it is not clear whether the urban greenery data has been applied to urban design and climate improvement, it is widely discussed in the ten reviewed articles on related topics that SV has the potential to provide reference to plan street trees so that they can effectively play the role of shading and cooling Du et al., 2020;Li et al., 2021).

Current research gaps
As we mentioned in Section 4.1.1, the development of methodology (deep learning) has led to the popularity of SV-based studies. However, deep-learning-based semantic segmentation is currently mainly used to measure the proportion of pixels in SV images (GVI and SVF). On the other hand, its application on object recognition, counting and measurement (such as tree species recognition, tree crown size and DBH measurement) is rare. We analyse that due to the complexity of objects and background, segmentation of objects from SV images and counting, identifying species or measuring requires a larger amount of training than pixel-based data. Meanwhile, the relevant datasets used for training are far less than pixel-based semantic segmentation, further increasing the challenge of conducting such research (Branson et al., 2018).
The technical bottleneck of deep learning causes the homogeneity of SV-based research. The majority of the studies reviewed here (104/135) use pixel-based segmentation. In contrast, without enough support for deep learning, manual identification is still used in object-based research (Berland et al., 2019;Ulus et al., 2021). Although this method is less costly and time-consuming than traditional fieldwork, it is less common as it does not make full use of the cost advantage of SV like pixel-based data collection.
In urban greenery ES research, most of the research focuses on the impact of greenery on residents' health and activity (i.e., cultural services), which is a topic suitable for pixel-based data to explore. In contrast, there are limited studies of other ES provided by urban greenery, for example, some regulating services such as air quality and carbon sequestration and supporting services such as biodiversity. Only 2 and 3 articles discuss air quality and supporting services, respectively, and no articles related to carbon were found. It may be because evaluating these ES requires additional data on greenery beyond pixel-based segmentation, for example, the shape and size of trees or shrubs for exploring the role of trees in the dispersion of air pollutants (Jeanjean et al., 2017), species of trees and DBH for calculating their carbon sequestration and air pollutant removal (Nowak, 2020). These research gaps limit the practical application of SV in some aspects. In urban forest management field, due to the limited objectbased data, the attempts to build inventories for street trees or to evaluate the health status of trees is just started, which are only being conducted in major cities with existing tree data as basis (Berry et al., 2022).
Assessment of the urban environment (such as air quality improvement and carbon sequestration) and environmentally oriented planning is another major practical topic less addressed in SV-based urban greenery research. Although current SV research on air quality notes that the relationship between the vertical structure of street greenery and air quality can provide support for street planning (Wu et al., 2020), it does not elaborate on how these greenery data can support the planning, like how to determine the location for planting trees or shrubs, chooser inform tree species selection. SV can support research on carbon sequestration by urban greenery by providing greater understanding of the depth and volume of tree canopies, and tree health and species, thereby providing more accurate information to calculate the amount of carbon sequestration. Carbon sequestration is currently estimated by allometric growth equations and exiting tree data (object-based) (Nowak et al., 2013). As many cities have declared climate emergencies and have decarbonisation strategies, SV could provide invaluable data for calculating on a city scale the carbon sequestration delivered by existing urban greenery and thereby progress (or lack of progress) towards city decarbonisation targets. However, we have not found any exploration in this field based on SV.

Limitations
As an innovative urban environment assessing tool, SV is becoming popular due to its increasingly ubiquitous availability. However, the approach still has disadvantages that limit its current application leading to homogeneity in the approaches and focus of current research. On the other hand, these limitations represent a research gap and opportunity for the future development of the field.

Image availability
4.2.1.1. Time availability. SV images availability for a location does not always cover every season. This is important in temperate environments where trees may be deciduous. This issue has been mentioned in many of the articles reviewed Ki and Lee, 2021). For example, some streets only have winter images. These streets often have to be excluded from analyses (Li et al., 2015).
Moreover, the timeliness of SV images is related to the images' update frequency. Images are updated infrequently in some cities and streets, making changes in greenery (such as tree growth, death, migration, and replanting) unable to be considered. The time lag between an SV image and current conditions may also contribute to differences between the virtual survey and the current situation.
4.2.1.2. Spatial availability. SV services are widespread, but many cities remain unmapped. For example, Google SV images are densely distributed in North America and Europe (Fig. 4). It is also welldistributed in East Asia (except China, while China has local SV operators as alternatives to Google), Australia, and South America. However, in other regions, especially developing countries in Asia and Africa, the distribution of SV is relatively sparse. This explains why Fig. 4 shows that only a few SV-based urban greenery research cases conducted in areas except North America, East Asia, and Europe, which is that the spatial distribution of SV images limits the conducting of this kind of study. In addition, privacy laws hindered SV image taking in some countries (e.g., Germany), which results in a limited local distribution of SVs (only covering major cities and main streets), slow updates, and blurring of details in SV images (Geissler, 2011).
Another aspect of the spatial availability limitation of SV is that most SV images only cover spaces near streets that cars can access. Streets are areas with a large flow of people and concentrate residents' activities, so SV can better reflect pedestrians' real green space exposure. However, it does represent a limitation as SV images cannot assess green infrastructure outside SV's field of view, including interiors of large parks and green spaces, green roofs, part of private gardens, etc. This situation suggests that SV is more suitable for studies focusing on street green infrastructures, like assessing street trees and evaluating their ES. It is also suitable for exploring residents' relationships with urban greening (e.g., greening and effects on health). On the other hand, SV is less effective in comprehensive assessment of urban green infrastructure and research on specific types of green infrastructures (e.g., parks, urban forests, and green roofs) than traditional research methods such as NDVI and on-site surveys.

Accuracy of extraction and identification
The issue of accuracy exists in each type of SV-based assessment. In pixel-based information extraction (GVI and SVF), the classification is based on pixels' attributes, resulting in pixels with similar hue and brightness easily classified into the same class, even though they may come from entirely different objects. For example, Larkin and Hystad (2019) demonstrated that their HSV-based GVI extraction method could not distinguish non-vegetation green pixels (e.g., traffic signs, green cars) from vegetation pixels. Another factor that causes misclassification is the weather, especially lighting conditions. Especially, SV images taken on sunny days may be partially overexposed (Lauko et al., 2020). Although the application of machine learning can reduce the rate of misclassification, the accuracy of machine learning is still lower than the actual situation. The result of Liang et al. (2017) shows that compared with manual extraction, The accuracy of SegNet for measuring SVF is between 80% and 99%. Chen et al. (2019) compared three GVI extraction methods (manual classification, PSPNet, and the Back Propagation-based method proposed in their study) and found that machine learning's mean intersection over union (IoU) (PSPNet: 65.4; BP: 64.2) and mean absolute error (MAE) (PSPNet: 4.81; BP: 5.02) was higher than the results of manual extraction (IoU = 63.9; MAE = 4.78). Moreover, the accuracy of machine learning depends on the number of samples. Generally, a large number of learning samples (10 million) are required to make the performance of machine learning similar to actual human perception (Goodfellow et al., 2016), which is difficult to accomplish.
The issue of accuracy also exists in object-based research. According to Table 3, the accuracy is still lower than on-site surveys. Various reasons cause this misidentification. Firstly, insufficient resolution of SV images makes small objects and details of trees (such as leaf shape, trunk colour, and texture, which are important information for judging tree species) difficult to identify. Secondly, factors such as perspective distortion, resolution, and inexperienced surveyors lead to errors in the measurement of object size (such as tree height and DBH). Wang et al. (2018) proposed that using tree limewhite with a known height as meters can effectively improve the accuracy of size evaluation. This is a significant manual intervention.
Some studies have tried to use deep learning to identify objects (like trees) and information extraction (Laumer et al., 2020;Lumnitz et al., 2021). However, there are not enough case studies to demonstrate the efficacy of the approach, especially when extracting relatively complex information such as species. Difficulties in extracting such information may be due to the complex and variable shape of objects (for example, each tree has different form and size), the low resolution of SV images, and insufficient training sets. This is a further factor leading to the homogenization of SV-based urban greenery research, and more species-specific studies are required to demonstrate the utility of SV images in this field.

Opportunities for future research
SV providers, represented by Google, are surveying in more cities and actively updating images in cities with SV services (Google, 2022). Moreover, bicycles and backpack cameras (panoramic camera that can be carried on the photographer's back, for example, Google Trekker) are becoming popular (Tung, 2018), which means SV may cover interiors of parks and urban green spaces where cars are difficult to attend. With the SV's time and spatial coverage and resolution increasing and the improvement of data extraction methods (e.g., the increase in the training sample of deep learning improves the accuracy), SV will have fewer limitations. It will be a handy tool for urban greenery assessment. Furthermore, the complementarity of SV and traditional research methods allow future comprehensive audits of urban greenery to combine multiple methods, like combining the advantages of SV in the vertical dimension and street greenness and the advantages of remote sensing images on the top-down perspective and the interior of the green space, to develop comprehensive urban greenery assessing strategies.
The current deep-learning-based assessment of objects in urban greenery, especially street trees, is limited by technology, but represents an opportunity for future development of the field. At present, deep learning has been used to identify street furniture such as traffic signs and utility poles in SV images Campbell et al., 2019). With the augmentation of the training set of trees, this technique may be applied in tree research and management, for example, by building street tree inventories and mapping them. Tree information that can only be obtained through high-cost field surveys or virtual manual audits at present, such as DBH, species, and tree canopy size, will also be obtained in a low-cost and fast manner.
New urban greenery data can promote the diversification of ES research. In particular, regulating services represent a current research gap. For example, only two SV-based studies has discussed the air pollutant remediation of urban greenery, especially street trees, and this is a clear research gap. Street trees have much higher exposure to air pollution than other trees. Conducting city-scale research to explore the effect of street trees on air pollutants is valuable both conceptually, to develop the discipline, and practically, for urban environment decisionmakers. Large-scale data collection based on deep learning can provide tree size information to simulate the impact of trees on air pollutant diffusion in street canyons in a 3D urban climate model. Species and DBH information can be used in numerical models, such as i-Tree Eco, for predicting the removal of air pollutants by street trees (Nowak, 2020). Carbon storage and annual sequestration of trees can be estimated using the biomass equation, which evaluates tree biomass and annual biomass increment from species, tree height, and DBH information (Nowak et al., 2013). In addition, with the improvement of the accuracy of object and species identification, SV images can be widely used in urban biodiversity research.

Conclusion
SV is an increasingly used tool for assessing greenery and assessing ES's in urban areas. Pixel-based GVI and SVF are the most common urban greenery data extracted by SV. With the development of deep learning and the increasing availability of training samples, the classification of SV image pixels is showing a trend of increasing accuracy and automation. These urban greenery data from SV have been applied on the management of urban green infrastructure and the research on green inequality. Furthermore, there have been many evaluation cases of GVI in ES of urban greenery. Particularly, the human-eye-like perspective of GVI makes it widely used in studies of cultural services related to residents, for example analysing the impact of urban greenery on residents' health and well-being, as well as -oriented urban planning. SVF is used in exploring the effects of urban greenery on urban climate, especially the effect of trees on reducing solar radiation and the cooling effect in street canyons.
Limited by deep learning technique, object-based assessment is currently less common than pixel-based assessment, but has the potential to play an increasingly significant role in urban green infrastructure management and large-scale research on ES of urban greenery (especially street trees). Compared to traditional on-site surveys, it can collect tree information with less workload and time investment. Tree information can also be used for assessing various ES, such as air pollutant removal and carbon sequestration. The urban-scale assessment and mapping of these ES are current research gaps, and we believe that with the help of deep learning techniques, the application of SV in these areas has a strong potential to fill these gaps, and will also play an important role in urban tree management and environment-oriented green infrastructure planning.
In conclusion, SV is a tool with unique advantages for urban greenery research. Urban greenery assessment based on SV is more accessible for modelling and reflecting people's actual greenery exposure, and it can make highly efficient use of research time. However, this assessing method also has shortcomings, so it cannot completely replace traditional research methods like remote sensing (NDVI) and field surveys, and limits its current application on urban greenery's ES assessment. The limitations include temporal and spatial availability, and the low accuracy due to factors such as image resolution, distortion, and researchers' experience. Moreover, the accuracy of deep learning for object-based research has yet to be proved. In the future, with image updates from SV providers and new images photographed for filling spatial and season gaps, as well as the improvement of image quality and machine learning models, SV will play an increasingly significant role in the assessment of urban greenery, especially greenery in proximity to roads such as street trees.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.