Street view imagery in urban analytics and GIS: A review

• Street-level imagery became ingrained as an important urban data source. • Most comprehensive review on street view imagery in geospatial and urban studies. • We have screened 619 papers to identify the state of the art, focusing on applications. • 250 studies are classified into 10 application domains and span dozens of use cases.


Introduction
Street view imagery (SVI) has gained a strong momentum in urban studies in the last few years. Such development was largely propelled by the proliferation of SVI data (coverage and development of services such as Google Street View), advances in machine learning and computer vision that enable extracting a variety of information automatically, and the growing computing power to facilitate processing large amounts of images.
As half of the world's population is now covered by SVI (Goel et al., 2018), it provides a valuable large-scale source of urban data, often replacing field visits with virtual audits (Badland, Opit, Witten, Kearns, & Mavoa, 2010;Berland & Lange, 2017). SVI has enabled examining visual features from the human (horizontal) perspective, which is not provided by other frequent data sources such as aerial or satellite imagery ( Fig. 1). In fact, SVI has been described as a counterpart of remote sensing imagery (Zhang, Wu, Zhu, & Liu, 2019).
Since the early days of services providing large-scale SVI, researchers recognised that it is well suited for assessing characteristics of the built environment (Kelly, Wilson, Baker, Miller, & Schootman, 2013). As such, it has been embraced across numerous domains. Over the years, SVI has been used for enhancing applications on contrasting sides of the spectrum of urban studies, e.g. real estate valuation (Law, Paige, & Russell, 2019), demographic studies (Gebru et al., 2017), collecting data on pedestrian counts (Yin, Cheng, Wang, & Shao, 2015), understanding crime (McKee et al., 2017), analysing accessibility (Hara, Le, & Froehlich, 2013), and mapping infrastructure defects (Chang et al., 2017).
Thanks to the wide coverage and fine spatial sampling of various SVI providers, comparative studies among cities around the world and the creation of indicators and indexes to rank them have also emerged (Naik, Philipoom, Raskar, & Hidalgo, 2014;Li et al., 2015;Long & Liu, 2017). Furthermore, street-level imagery has proven itself valuable in conjunction with other sources of data such as social media Ye, Zhang, Mu, Gao, & Liu, 2020), and also for creating new geospatial data (e.g. mapping trees (Seiferling, Naik, Ratti, & Proulx, 2017)) and enhancing existing datasets (e.g. inferring the type of a building from SVI to enrich a building dataset (Kang, Körner, Wang, Taubenböck, & Zhu, 2018)).
In this paper, we provide a review of current applications of street view imagery in studies related to the urban context and the built environment and a synthesis of the most recent advances in the field, together with various aspects such as limitations and research opportunities. To the extent of our knowledge, this is the most comprehensive and wide-ranging review paper on this topic.
In Section 2, we briefly describe existing reviews. In Section 3, we give an overview of SVI services to provide an understanding of their differences and similarities, as a relevant aspect for our service-agnostic exploration. In Section 4, we describe the methodology of our systematic review. Section 5 summarises the quantitative insights of the review. The substance of the paper is Section 6 -it describes the state of the art of applications of SVI in urban analytics, systematically organised by application categories that we have derived during our review. Section 7 proceeds to discuss the state of the art by summarising the obtained insights, key lessons learned, common challenges and issues, and it outlines research opportunities. Finally, Section 8 concludes the paper with takeaways.

Related work
To the best of our knowledge, there have been three review papers published in international scientific outlets that may be considered to be related to ours.
In their review, Ibrahim, Haworth, and Cheng (2020) underscore the role of computer vision in understanding the interactions in the built environment. The review cuts across several topics (e.g. satellite imagery, algorithms), with street view imagery not being in the principal focus. Our review paper specifically zeroes in on SVI and provides a comprehensive review of the state of the art, predominantly focusing on its applications. Kang, Zhang, Gao, Lin, and Liu (2020) provide a review on the use of SVI for sensing urban environments in public health studies. Besides asserting the importance of SVI in auditing the built environment and examining the relationship between the environment and health outcomes, their paper also summarises the key aspects of how does (predominantly commercial) SVI differentiate itself from other forms of urban data: (1) large coverage thanks to omnipresent map service providers; (2) relatively homogeneous quality, sampling, and resolution; (3) free and efficient access to the data; (4) reliable and rich metadata; and (5) capture of the urban scenery from a human perspective.
The paper of Rzotkiewicz, Pearson, Dougherty, Shortridge, and Wilson (2018) is another review focused on health research. They underline that the strong points of SVI are their low cost, ease of use, and time saved. At the same time, the weaknesses are image resolution and spatial and temporal availability in developing regions. Finally, they highlight that studies from South America, Africa, and rural areas are scarce. Our paper confirms this statement, as during our exploration we have collected metadata on the geographical coverage of the studies, which we discuss in Section 5.
In our review, we realise that health studies are indeed a common application of SVI but just one among many. Thus, we expand on the aforementioned reviews by providing a broad, holistic, and comprehensive overview of the opportunities that SVI provides in a wide range of urban studies and geospatial applications.

Overview of major services
At the moment, there are dozens of street view services, most of them being regional covering one or a few countries. 1 This section describes the key services, primarily those that have worldwide coverage, with details that will aid in understanding different aspects discussed later in the paper.

Google Street View
Google Street View (GSV) is arguably the most well-known and widespread service providing SVI (Fig. 2). Barring rare exceptions such as backpack-mounted cameras to survey narrow roads, the panoramic imagery is acquired in a standardised manner: from a car mounted with multiple cameras on its roof, accompanied with various sensors including lidar (Anguelov et al., 2010). Since its launch in 2007, Google Street View reached coverage of more than 90 countries, expanding also into indoor spaces. The vast majority of imagery provides omnidirectional coverage, and it is taken from public roadways, except for a number of landmarks and some unconventional locations such as the International Space Station. The service can be accessed through the web interface integrated with Google Maps, smartphone apps, and an API (e.g. the images in Fig. 2 were downloaded through the Google Street View Static API). It is important to note thatunlike the web servicethe API does not allow fetching historical imagery and it provides imagery at a lower resolution. Fig. 1. Illustration indicating the edge street view images have over those derived from aerial/satellite platforms, which have been used traditionally to extract spatial information. SVI pivoted the usual perspective from vertical to horizontal, enabling new insights into the built environment and facilitating new applications.

Crowdsourced services: Mapillary and KartaView
Mapillary and KartaView (until November 2020 known as Open-StreetCam) are the remaining two services with a global focus. They both rely on crowdsourced imagery and are owned and operated by commercial entities. Because of their intrinsic similarities, they are described together.
Anyone can contribute to Mapillary and KartaView, and the data can be used freely as both are licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. Because contributors are free to upload their data to both platforms, some images can be found in both services. Both services are closely related to the Open-StreetMap (OSM) project and have been used as a data source for mapping in OSM Leon & Quinn, 2019). For example, Mapillary allows a special licensing arrangement so that the imagery can be used as a data source for acquiring data in OSM, and it is integrated into some OSM editors, facilitating mapping and tagging features, which can be quite beneficial as SVI gives the means to discern a multitude of information that is not possible with the traditionally used satellite imagery (see Fig. 1). Data from Mapillary has also been used to construct the well-known Mapillary Vistas Dataset, an annotated training dataset for semantic segmentation of street scenes (Neuhold, Ollmann, Rota Bulo, & Kontschieder, 2017), which can be used for e.g. training automotive AI systems (self-driving vehicles).
In many ways these crowdsourced SVI platforms are similar but yet also significantly different from GSV, offering some advantages, and having disadvantages as well.
The services are open to nearly any kind of SVI taken with any suitable equipment from different moving platforms. For example, see the images in Fig. 3, which cover the same location and view as in the GSV example in Fig. 2. The Mapillary image (Fig. 3a), was acquired with a smartphone, presumably taken by a contributor sitting in the first row on the upper deck of a public bus; while the KartaView imagery ( Fig. 3b) was obtained with a dashcam mounted in a private vehicle.
An advantage of such crowdsourced services, dubbed also as volunteered SVI, might be higher temporal resolution and coverage in places where GSV is not available (Mahabir, Schuchard, Crooks, Croitoru, & Stefanidis, 2020). That is, on the micro-scale, it may include imagery from pavements, cycle tracks and walkways, while on the largescale it may offer coverage in cities/countries where GSV and other commercial services are not available . Ma, Fan, Li, and Ding (2019) performed an exploratory analysis of Mappilary data. One of their main findings is thatin contrast to GSVa significant portion of images has been collected by users while walking and cycling. Furthermore, while the data and users have global coverage, it is especially ample in Europe and North America. Given the crowdsourced nature of Mappilary and KartaView, the spatial sampling is one of the main factors differentiating these platforms from Google Street View and other commercial services, which tend to have full coverage of cities and relatively homogeneous sampling (Quinn & León, 2019).
Another notable difference is that both services allow downloading imagery contributed by different users taken with different equipment at different times at the same location. That means that in some locations, the temporal resolution of the imagery will be finer than of GSV, which is typically acquired every few years, and having restricted access to older imagery. On the note of downloading, a further advantage over GSV is that the imagery can be fetched at a higher resolution.
When it comes to the nature and quality of imagery, there are a few key aspects to note. First, more often than not, the imagery is not panoramic as in GSV. It is frequently acquired with dashcams recording the front view of the road, rather than the streetside, typically offering a narrower field of vision than GSV imagery, consequently limiting insights that can be extracted (cf. Fig. 2 and Fig. 3). Second, because of the large differences among contributors and equipment they use, the quality of the imagery is inevitably highly heterogeneous. For example, note the reflection on the windshield in Fig. 3b, as the imagery was recorded from inside a vehicle. Third, the positional accuracy of the data is not always high, which may cause issues in mapping applications

Fig. 2.
Example of street-level images in Google Street View, which are part of the same panorama (Orchard Road, Singapore; September 2020). ©2021 Google.

Fig. 3.
Comparison of images of the same location as in Fig. 2, obtained from the two crowdsourced services, contributed by users using different equipment and platforms. (Krylov & Dahyot, 2019). Fourth, at the moment, the spatial coverage of these user-contributed services is not nearly as comprehensive as that of GSV, which is a notable limitation.

Tencent Street View and Baidu Total View (China)
As further examples of SVI services, after overviewing those with a global focus, in this section we focus on local instances. Our review will reveal (Section 5) that there is a sizeable portion of papers conducted using two SVI providers in China: Tencent Street View and Baidu Total View. Therefore, in this section, we will give a brief overview of these two examples of local services.
Baidu Maps is a web mapping service provided by Baidu, which can be considered as the counterpart of Google Maps for China. Since 2013 it offers a street view service -Baidu Total View. While the coverage of satellite imagery and maps in Baidu Maps spans beyond China, SVI is available only for China. Tencent Maps is a similar local service, provided by Tencent, and since 2011 it has been offering SVI under Tencent Street View.
Since GSV is not available in mainland China due to business restrictions (Liang et al., 2017), and because Tencent and Baidu street view services are in many ways equivalent to GSV (e.g. they are panoramic and they offer API access) (Long & Liu, 2017;Williams, Xu, Tan, Foster, & Chen, 2019), it is not a surprise that researchers focusing on Chinese cities have been taking advantage of them in their studies. Researchers also assert that efforts developed using these local services should be replicable using GSV as well (Cheng et al., 2017).
Local services in many other countries are also similar to GSV, and as in the case of China they might have coverage in specific places where GSV is not available, e.g. GSV is not available in Morocco but Carte.ma Streetview, a local service, covers about 10 major cities; and while GSV is available in Yerevan, Armenia, Yandex has a notably denser coverage. However, as it will be evident in the next section, our review uncovers that the two Chinese services mentioned in this section are virtually the only two local data sources that are featured in international peerreviewed literature.

Overview and time frame
In identifying papers relevant for this review, we have followed the common systematic review methodology, which is also in line with the latest review papers published in the field (e.g. Berthon, Thomas, & Bekessy, 2021;Chatzimentor, Apostolopoulou, & Mazaris, 2020). That is, we have selected a few relevant keywords to fetch the initial pool of papers, which we have screened to sift out those that are not relevant for this review. Afterwards, we have focused on the papers identified as relevant, and extracting information from them. Considering our aim to review the most recent advances in the field, we have focused on papers published in the last three years (2018, 2019, and 2020). At the same time, to ensure that our review is sufficiently diverse and that it captures most, if not all, of the applications of SVI, we have randomly sampled papers published before this time frame, and we ascertained that there are no instances that are not covered already by those in the aforementioned period. The papers published in the last three years mainly continued research on the same applications and introduced new ones. The details of this process are described in the continuation of this section.

Search criteria
To identify an initial pool of papers, we have searched Scopus for all recent publications that contain relevant keywords 'street-level imagery' and 'street view' in their title, abstract, or keywords. We have noticed that in literature, the terms street view, street-level image, and street-level imagery are common and used interchangeably, so we have used these terms in the search. Using these broad terms ensures high diversity and number of papers required to capture the breadth of applications, but also strengthen the discussion of accompanying topics such as research opportunities.
While the keyword 'street view' is of generic nature, it also doubles to capture all papers mentioning 'Google Street View' and 'Tencent Street View'. For that reason, one might argue that the review will be biased towards these services. However, this is not the case. To make sure that our search includes a wide range of papers and it is not biased towards the aforementioned services, we have searched also for a couple of other specific services. For example, searching for 'mapillary' gives 31 results, while identifying papers 'openstreetcam' gives only 3 publications (as a comparisonsearching for 'google street view' results in hundreds of publications). We have realised that these papers have either already been captured in our initial search with the generic keywords, or if notthey are almost exclusively focused on topics outside of the focus of this review (e.g. published in computer vision outlets and describing research not of relevance for this review). Furthermore, we have picked a couple of local services and browsed through the literature to identify papers mentioning them but to no avail as they mostly did not yield any result at all. For example, searching the literature for Malaysia's Urban Explorer and Kuwait Finder does not return a single publication that is relevant for this review.
The search and the review were performed during the second half of 2020, with the final query executed on 14 November 2020. It yielded 619 publications.

Selection criteria, screening, and extraction of information
Afterwards, we have screened the abstracts of the papers in the initial pool to create a corpus of those that are relevant for this review, following the criteria: (1) the study was conducted within an urban context; (2) the paper is in English; and (3) the study is not predominantly a computer vision paper (e.g. one that deals with advancing a machine learning method in which SVI is used only for testing purposes). Almost all papers fulfilled the first two criteria, except a few that have been excluded because they do not focus on the built environment and urban context, e.g. using GSV for agricultural monitoring (d' Andrimon, 2018). More than half of the papers were chiefly computer science articles, rather than an urban or mapping study, so they were excluded.
Out of the 619 initial publications, 250 have been carried forward for the review. During the review, for each paper, we have extracted several characteristics (e.g. street view service that was used, geographical coverage, open science aspect, and the number of images used in the study), which we summarise in Section 5.
As it is the case with other systematic reviews, we acknowledge that there is a possibility that we have inadvertently excluded some relevant papers. Nevertheless, considering the large number and variety of papers that we have reviewed, we are confident that our review does not suffer from significant bias, it is sufficiently representative of the current trends in this domain, and it presents a stringent and comprehensive snapshot of the state of the art.

Taxonomy and thematic clusters
After examining all relevant studies, we have developed their meaningful categorisation. The delineation of applications in review papers such as this one is complex and may be subjective (Biljecki, Stoter, Ledoux, Zlatanova, & Çöltekin, 2015), which is compounded by the very diverse and intertwined landscape of research in this topic.
We have delineated the papers by topicinto 10 categories: greenery, urban morphology, transportation and mobility, socioeconomic studies, real estate, walkability, health and well-being, urban perception, spatial data infrastructure, and other. The state of the art of applications of SVI is described by these identified domains (Section 6). To give justice to the breadth of all urban applications of SVI we have identified but at the same time to balance the length of the paper, we mention most of them briefly to create an inventory, and select a diverse subset that we describe in detail. For further information on a particular application, the reader is referred to the rich list of references.

Results
This section describes the general insights and statistics of the screened papers. First, Fig. 4 indicates the temporal evolution of the number of papers in the last 15 years in our initial pool, suggesting a steady upward trend of papers relying on SVI.
In Fig. 5, we include the share of street view services (i.e. data sources). The first key observation is that GSV dominates in research projects, thanks to its coverage and quality of data, being used in about two thirds of studies. The next important insight is that in the second and third place follow Chinese services Baidu and Tencent, and not volunteered services, as one might expect. Despite the increasing coverage and open data policy of Mapillary and KartaView, and their popularity in the community, it appears that crowdsourced imagery has not yet gained currency in urban studies, likely because they are not complete enough yet and have issues such as heterogeneous quality and a small share of panoramic imagery. Many of the methods and applications in the identified papers require consistent coverage and quality, especially if computer vision techniques are used. It is also relevant to note that virtually all studies focus on one data source, with just a few exceptions (e.g. Krylov & Dahyot (2019) use both GSV and Mapillary in a comparative study on positioning objects detected in imagery).
Given the high spatial resolution of street view services, cities may be covered with hundreds of thousands of images. Thanks to advancements in computer vision and availability of computing power, it is possible to analyse such a large amount of data. When provided, for each study, we have extracted the number of images that were used in the data analysis (Fig. 6); and indeed a large portion of the studies have analysed thousands of images. Moreover, there is more than a dozen studies processing more than a million images. For example, in a study producing neighbourhood summaries of conditions across the United States, Nguyen et al. (2020) analyse 164 million images. Fig. 7 illustrates the open science aspect of this research domain. Nearly a third of papers are published as open access, while the situation with data and code is far less beneficial. Only a fraction of studies offers open data or open-source code, inhibiting replication and reproducibility. For an example of a paper with the resulting data released as open data, see (Toikka, Willberg, Mäkinen, Toivonen, & Oksanen, 2020), in which a dataset describing the visibility of vegetation in Helsinki was generated by analysing SVI.
The geographical aspect of the studies is also important to consider. For each study, we have noted the spatial extent in focus. The most frequent case is that a study focuses on a single city: such cases account for 80% of papers. We have identified 89 unique locations, which are mapped in Fig. 8. There have been also comparative studies that focus on multiple locations. In most cases, such studies focus on a few or several cities. For example, Fu, Jia, Zhang, Li, and Zhang (2019) use imagery from Baidu to extract scene perception characteristics and understand the influence on housing prices, and compare the results between Beijing and Shanghai. However, there are also studies including dozens of locations. For example, Goel et al. (2018) analyse imagery from 34 cities in Great Britain to predict travel patterns. Among the studies that include multiple locations, 44% of them are international, that is, including cities from different countries.
Despite the availability of data, there are clear gaps in most of Asia, South America, and Africa, but also in much of Europe. Some of these locations have in a few occasions been subject of large-scale comparative studies, but not publications solely focusing on them. Such gaps present a scientific opportunity, e.g. at least inviting replications of studies carried out elsewhere. Fig. 9 reveals the share of categories according to our taxonomy (Section 4.4). There is no application that is predominantly popular (i.e. accounting for more than a quarter of publications), though there are significant differences among their prevalence.

Spatial data infrastructure
The application domain with the largest number of publications in the recent period is using imagery for creating and maintaining spatial data infrastructures. While the studies that will be presented in the subsequent sections also focus on extracting objects from images, this is a category of research that does so predominantly or solely for mapping purposes, i.e. purely to collect spatial data. SVI presents a significant opportunity to keep maps updated. Thus, it is no surprise that a large number of papers proliferated exploring such potential.
While publications presented in this section stop short of an analysis/ urban study and do not use SVI for a purpose beyond data collection, an advantage of such publications is that they usually tend to be more detailed on the methodology and performance of the data collection. They may also provide ideas for future urban studies that might take advantage of particular information that have not been explored yet.
Many of the identified studies focus on buildings. Aside from some exceptions such as mapping buildings (Ogawa et al., 2019;, studies focus on extracting their characteristics to improve semantic completeness. As building information in crowdsourced venues such as OpenStreetMap is often sparse (Biljecki, 2020), such techniques increasing the completeness of attributes might contribute to use cases requiring them.
The type, condition, and function of a building appear to be the key characteristic related to buildings that were subject of research Gonzalez et al., 2020;Laupheimer, Tutzauer, Haala, & Spicker, 2018;Yu et al., 2020). It is often done in combination with aerial or satellite imagery (Hoffmann, Wang, Werner, Kang, & Zhu, 2019). For example, Li, Chen, Rajabifard, Khoshelham, and Aleksandrov (2018) demonstrate the estimation of the year of construction of buildings from GSV images in Victoria, Australia. The age of a building is a critical piece of information for energy demand and retrofit studies. Therefore, their method can be used to enrich building datasets without such attribute to enable such studies. In relation to buildings, SVI was used to estimate the height of a building and number of floors, which can Fig. 4. The rapid increase of urban studies using SVI. Note that year 2020 is excluded from the plot since it is not complete and Scopus might still be adding 2020 papers well in 2021. However, the number of papers published so far (during the submission of this paper in November 2020) suggests that it will continue the upward trend, exceeding the year 2019. be used to generate its 3D model (Kim & Han, 2018;Taubenböck, Kraff, & Wurm, 2018;Kraff, Wurm, & Taubenböck, 2020). However, the accuracy has not been reported. On that note, Bruno and Roncella (2019) have investigated 3D reconstruction from GSV, but report hit-or-miss results. Generating 3D models from GSV has been a long-standing topic of interest with papers dating to the early days of this subject (Torii, Havlena, & Pajdla, 2009). A distinct work presented by Kim, Kim, and Choi (2019) demonstrates inferring characteristics of cities from SVI and passing them into a procedural modelling engine to generate 3D city models. However, their methodology generates data of imaginary cities rather than of the real-world. Wang, Kang, and Zhu (2018) combine SVI with spaceborne synthetic aperture radar (SAR) data to generate 3D building models in Berlin. Their work suggests challenges such as borderline quality of images, but indicate that by fusing multiple datasets, one can leverage on particular advantages of each dataset.
Further studies focused on extracting building characteristics include detecting graffiti artwork in facades (Novack, Vorbeck, Lorei, & Zipf, 2020;Tokuda, Cesar, & Silva, 2019) and identifying commercial establishments (Peng, Gao, Xiao, Guo, & Yang, 2018). Noorian, Psyllidis, and Bozzon (2019) use GSV to classify the type of points of interests (stores) located in buildings. Their method largely relies on extracting text from storefronts, classifying them into 22 categories, such as bookstore and pharmacy. For related work see Noorian, Qiu, Psyllidis, Bozzon, and Houben (2020). Srivastava, Muñoz, Lobry, and Tuia (2018) utilise GSV to predict land use (e.g. educational, hospital, religious; derived from OSM data) in France, with preliminary work in the Netherlands (Srivastava, Vargas-Muñoz, Swinkels, & Tuia, 2018). They report mixed results as the accuracy depends on the class due to the similarity and overlap between classes. In an extension of the work, Srivastava, Vargas-Muñoz, and Tuia (2019) investigate the fusion of aerial and ground views, improving the accuracy of the predictions. Among other reasons, this study is highlighted for its focus on the contribution of SVI among other sources of urban data. Further studies focused on land use and urban zoning classification include the publications of , , Huang, Qi, Kang, Su, and Liu (2020), Feng et al. (2018), Karasov, Külvik, Chervanyov, and Priadka (2018), and Chang et al. (2020).
Likely because road data is nowadays complete and easily obtainable (Barrington-Leigh & Millard-Ball, 2017), mapping roads is seldom conducted, but they have been subject of classification and semantic enrichment. For example, Marianingsih and Utaminingrum (2018) have investigated using GSV images to classify the road surface type (e.g.   asphalt, gravel). Related work has been done on the classification of street types (Zhang, Siriaraya, Kawai, & Jatowt, 2020), assessing the quality of roads and detecting defects (Chacra & Zelek, 2018), and quality control of road data (Zhou & Lin, 2020).
Though most of the studies described in this section are focused solely on mapping features or enriching attribute information, there are instances that demonstrate the use of the collected data for change detection (Branson et al., 2018;Peng et al., 2018;Revaud, Heo, Rezende, You, & Jeong, 2019), presenting an opportunity for maintaining spatial data infrastructure and providing possibilities for studying urban development.
Researchers report that the reliability and accuracy of localisation much depend on the imagery and object. For example, Krylov et al. (2018) use GSV imagery to detect utility poles and traffic lights with a success rate of above 90% and report a positional accuracy of 2 m. Branson et al. (2018) use the same source to map and classify trees. Their method detects about 70% of trees, mapping them with an accuracy of below 2 m in 70% of cases, which is reaffirmed by a similar study conducted by Li and Yao (2020). In the study of Peng et al. (2018) focused on mapping shops in buildings, above 80% of them are correctly recognised, and an average positional accuracy of 8 m is achieved.
In conclusion, the studies suggest a high potential of using SVI in creating and maintaining spatial databases, especially for features that are frequently less in the focus of contributors in crowdsourced geoinformation, such as lamp posts and traffic signs. However, the studies presented in this section almost always focus on a specific location. As the type and appearance of certain features may be highly variable across different geographies (Thirlwell & Arandjelović, 2020), some of these studies may be challenging to replicate in other locations or may result in a different performance. Furthermore, positional accuracy remains a challenge inhibiting the generation of highly accurate spatial datasets, mostly due to noise and difficulties in localisation (Cheng et al., 2018;Krylov & Dahyot, 2019).

Greenery
The publication landscape is dominated by applications on extracting vegetation from SVI for studying urban greenery and related topics, e.g. thermal comfort, aesthetic conditions, and health. These studies are also distributed among other themes when they predominantly focus on that theme (e.g. association of greenery and obesity). Thus, we place this section at the beginning of our review, giving a foundation to understand the subsequent sections. This section overviews studies chiefly focused on understanding vegetation in the urban context using SVI.
For the most part, studies essentially measure the amount of nearroad greenery (e.g. relative measure of vegetation visible at the street level at a location) at a city-scale or across multiple cities, for urban planning interventions, and other applications such as planning tree maintenance, planting efforts, and greenway networks (Cai, Li, Seiferling, & Ratti, 2018;. A common measure that is used to quantify urban greenery and evaluate the visibility of urban forests is the Green View Index (GVI) (Yang, Zhao, Mcbride, & Gong, 2009), which is focused on the pedestrians' view of greenery and can be extracted using SVI and computer vision techniques, largely semantic segmentation (Li et al., 2015;Stubbings, Peskett, Rowe, & Arribas-Bel, 2019), which has been used in dozens of studies (Dong, Zhang, & Zhao, 2018;Wu, Gong, Liang, Sun, & Zhang, 2020;Xia, Yabuki, & Fukuda, 2020); an example is given in Fig. 10. There have also been alternative but related developments and modifications (Chen, Meng, Hu, Zhang, & Yang, 2019;Yu, Zhao, Chang, Yuan, & Heng, 2018;Lauko, Honts, Beihoff, & Rupprecht, 2020;Labib, Huck, & Lindley, 2020), such as an index quantifying the ability to see urban street greenery during transportation (Wu, Cheng, Chu, Xia, & Li, 2019).
Many of the studies focus on a single city, but there are some that cover multiple territories, providing comparative analyses. Further, almost all studies focus on a single time period, but there are exceptions providing an analysis on the temporal change of the GVI in a city (Li, 2020).
Much of the research on quantifying urban greenery involves satellite imagery, however, SVI has an unparalleled advantage over them if the objective is to assess profile views of street greenery and giving an understanding of what people see on the ground, which cannot be captured by most remote sensing methods (Li et al., 2015). Nevertheless, studies often use SVI in combination with remotely sensed data, such as airborne lidar and satellite observations, to get a complete picture on urban greenery both near-road and in parks and off-street yards (Barbierato, Richards & Wang, 2020;Gu, Chen, & Dai, 2019) or for validation of other methods (Kumakoshi, Chan, Koizumi, Li, & Yoshimura, 2020).
A few detailed examples of studies follow. Li, Ratti, and Seiferling (2018) quantify the contribution of street trees in shading developing a method relying on GSV. Besides the aforementioned advantage of the pedestrian perspective, the researchers argue that the advantage of SVI over aerial or satellite imagery is that the above ground imagery cannot fully capture the shading effectiveness of the street trees. Using a segmentation technique detecting the portion of the sky in each panorama, their study estimates the sky view factor at about 300 locations in Boston, as a proxy for shade. By comparing locations with different amount of canopy cover, and accounting for the obstructions caused by buildings, the results suggest that street trees contribute to a decrease in the sky view factor by 18.5%. Studies such as this one contribute to understanding the role of greenery in increasing thermal comfort.  measure the visible street greenery at a city-scale in Singapore, at a very dense resolution (more than 180 thousands locations). The quantified greenery, extracted and classified from GSV imagery, was coupled with values of pedestrian accessibility, which was quantified using a street network from OSM. One of the benefits of a study such as this one is to better inform urban planning interventions, e. g. establishing locations with priority for greening.
As much as conceptually there is a substantial overlap between the methodology of quantifying urban greenery among the identified studies, the range of the applications is wide, and we describe them in each respective theme. In the remainder of this section, we feature two general studies that are more appropriate in this section. Chen, Zhou, and Li (2020) calculate the green view in several cities in South China using data from Baidu, and find a positive correlation with socioeconomic indicators such as GDP, and with public revenue, indicating the importance of financial power of cities when constructing public green space. Wang, Hu, Tang, and Zhuo (2020) examine greenery in Beijing, suggesting its negative correlation with population density, and mixed results when determining the association with housing prices.

Health and well-being
Health studies documenting the application of SVI are plentiful and a major theme that we have identified. Thus, it is not surprising that the two review papers outlined in Section 2 have been focusing solely on this domain. Researchers in this domain recognise SVI as an important source to derive a variety of indicators on built environment characteristics that can be analysed to assess their association with the impact on health and well-being they have. Such results may be used to inform public health officials and policymakers to address issues and improve structural factors Phan et al., 2020;Javanmardi et al., 2020).
Much of the health and well-being studies rely on quantifying greenness exposure, hence this section in a way extends the previous one. For example, many studies call attention to the association of physical activity and greenery in a neighbourhood (He, Lin, Yang, & Lu, 2020;Yang et al., 2019;Villeneuve et al., 2018). While investigating such relationship has been studied long before the availability of SVI providers, primarily due to the global availability of satellite-derived normalized difference vegetation index (NDVI), its proliferation has enabled analyses at a larger scale and it enabled including streets rather than only parks and other green spaces (Lu, 2019). For example, Nguyen et al. (2018) use GSV to extract street greenness, crosswalks (as a sign of walkability), and building type deriving indicators to describe the built environment at a zip code level in three cities in the United States. The study suggests that there is an association between the neighbourhood characteristics and the prevalence of obesity and diabetes, i.e. the areas with the greenest streets and crosswalks had a lower prevalence of obesity and diabetes.
In a large-scale multivariate study involving 31 million images at 7.8 million intersections in 416 cities in the United States, Keralis et al. (2020) extract several built environment indicators at each location. Fig. 10. Semantic segmentation is the predominant computer vision technique that is used in calculating the amount of greenery from street-level imagery. It is also frequent in studies in other thematic categories. The image in Fig. 3b was segmented using DeepLab, a deep learning model for semantic image segmentation and was trained on the Cityscapes dataset Cordts et al., 2016). The green portion of the overlayed mask represents the vegetation that is detected in the original image, facilitating the quantification of indicators such as the GVI.
Besides the previously mentioned common indicators, they examine whether visible utility wires overhead (as a scale of physical disorder) and whether a road is single-lane or not (as an indicator of the lower level of urban development) might have links to various health outcomes. Among other results, the study reveals that visible wires are associated with increased prevalence of all health-related behaviours (e. g. higher prevalence of diabetes, physical and mental distress, and drinking). Identifying indicators of neighbourhood physical disorder, such as defaced properties, litter, and abandoned cars, and linking them to health outcomes was also investigated by Chen  Further studies include focusing on happiness (Hart et al., 2018), obesity (Li & Ghosh, 2018;Xiao, Zhang, Sun, Tao, & Kuang, 2020;Yang, Lu, Yang, Gou, & Zhang, 2020), stress (Jiang, Larsen, & Sullivan, 2020), and mental health Hoffmann et al., 2019;Wang et al., 2019;Wang et al., 2019;Wang et al., 2019;Wang et al., 2019;. Most studies have a substantial overlap in the features they extract from imagery (e.g. trees, crosswalks). Less common extracted characteristics of the built environment include sidewalk quality (Schootman et al., 2020;Gustat et al., 2020), recreational facilities , and street interface enclosure .
Another domain of studies related to health and well-being is on infectious diseases. Andersson, Birck, andAraujo (2018, 2019) assert that the spread of diseases may be attributed to environmental factors, many of which can be sensed from SVI. In their papers, they focus on dengue fever. On that note, Haddawy, Wettayakorn, Nonthaleerak, Yin, and Wiratsudakul (2019) detect outdoor open containers (e.g. buckets and potted plants), which constitute potential dengue vector breeding sites. An application of their work is creating detailed dengue risk maps of large areas.
SVI has also been used to supplement movement trajectories to provide additional insight for health studies. For example, Li, Deal, Zhou, Slavenas, and Sullivan (2018) carried out a study for understanding the mood of adolescents, in which the movement of the participants was tracked. SVI from GSV was matched to the logged locations to gather more information about the surroundings, suggesting that greater exposure to nature was associated with a better mood.
Finally, Egli et al. (2018) use GSV to examine food and beverage advertising around schools in Auckland, New Zealand, to determine the exposure of children to such ads.

Urban morphology
SVI is a powerful source to measure the urban form as perceived by a pedestrian in a street canyon (Middel,  Examples of related studies, which are mostly focused on urban climate, are given in the continuation. GSV was used by Hu, Zhang, Gong, Ratti, and Li (2020) for street canyon classification in Hong Kong, presenting a valuable input for understanding the impact of building density on microclimate. The study also reveals that the performance of the classification is degraded by the amount of sunshine in the image, i.e. images in canyons that have East-West orientation may perform worse than those with the North--South orientation. On that note, Li, Cai, Qiu, Zhao, and Ratti (2019) present a method to estimate the sun glare using GSV panoramas. Their work relies on image segmentation of GSV panoramas thanks to which obstructions that would block the glaring sun are detected, and locations vulnerable to sun glare are mapped. One of the use cases of their work is in traffic safety, as sun glare is a frequent factor leading to traffic accidents. Therefore, the method could be used to predict at a large scale on which roads and at what time does sun glare occur, and integrate it into navigation devices. A related study is the one of Du, Ning, and Yan (2020) estimating the sun duration at different locations in street canyons.
A significant number of studies concentrate on estimating the solar irradiation, the sky view factor (SVF), and related indicators urban geometry that may be used for various purposes, from microclimate studies to understanding light pollution (Nice et al., 2020;Liang et al., 2020;Gong, Zeng, Ng, & Norford, 2019;Gong et al., 2018;Li, Duarte, & Ratti, 2019;Tang, Zhang, Chen, Wan, & Li, 2020;Zhang, Middel, & Turner, 2019;Khamchiangta & Dhakal, 2019;Sun et al., 2020;Zeng, Lu, Li, & Li, 2018). The studies are mostly using approaches congruent with the one described in the previous paragraph. In this list, we emphasise the work of Liang et al. (2020) who develop GSV2SVF, a software to calculate the SVF from GSV, and release it open-source. Their software estimates also the tree and building view factors.
Further studies on urban form that use SVI include the work of , which uses imagery to measure the amount of shade in outdoor recreation spaces, such as playgrounds and swimming pools. Data on the provision of shade in urban open spaces is important to support informed decisions in urban planning, and it has health implications such as skin cancer prevention .  posit that urban functions of streets, such as the amount of open space and building enclosure, which is extracted from SVI, can be used in street quality assessment. Finally, Monteiro and Turczyn (2018) inspect GSV data manually to monitor the evolution of the urban form.

Transportation and mobility
Considering that SVI is captured along streets, transportation and mobility studies are unsurprisingly another major application area. Most use cases in this domain revolve around traffic safety, as SVI provides a convenient source to conduct virtual street audits and extract characteristics of roads (Hong, McArthur, & Raturi, 2020). Hu, Wu, Huang, Peng, and Liu (2020) investigate clusters of pedestrian crashes, and explore the relationship between crashes and road infrastructure characteristics. They gather several variables on roads from SVI, such as number of lanes, road surface condition, and width of the sidewalk. SVI is found beneficial, as it may provide additional attributes on the road network that are not typically available in traditional GIS datasets. For related studies on using SVI in the context of pedestrian safety and crashes see the papers of Mooney et al. (2020), Nesoff et al. (2018), Kwon and Cho (2020), and Isola et al. (2019).
Cycling safety has been subject of research as well. For example, Cicchino et al. (2020) look in the variations of protected bike lanes (e.g. degree of physical barriers) to understand their relationship with cyclist crashes and falls. Cycling infrastructure characteristics have been provided by patients from emergency departments who fell or crashed while cycling, however, GSV was used to confirm them.
Besides the safety portion of this domain, the remaining identified use cases are quite diverse. For example, SVI was found useful by  for automated assessment of pedestrian volume at a large geographic scale. Using a machine learning technique, they count the number of pedestrians in images, approximating the pedestrian volume at different locations. In their comparative study, the researchers assert that SVI can be used to replace the traditionally used laborious field observations, but they also expose limitations of SVI, most importantly that each image represents pedestrians only at a certain point in time.
Transportation and mobility behaviour has been another major area of research in this domain with several studies published spanning multiple transportation modes Lu, Sarkar, & Xiao, 2018;Goel et al., 2018;Lu, 2018;Zang et al., 2020;Ibrahim, Haworth, & Cheng, 2019). For example, den Braver et al. (2020) recognise that the degree of car usage, besides being explained by individual characteristics, is also largely driven by neighbourhood environment characteristics. Many of these indicators have been gathered from SVI, e.g. density of speed bumps, pedestrian crossings, and traffic lights. Moving on to cycling, there have been studies explaining cycling patterns, e.g.  and Lu, Yang, Sun, and Gou (2019) establish the relationship between cycling behaviour and exposure to greenery, while Verhoeven et al. (2018) examine the preference of cyclists towards preferred routes, revealing the influence of speed limits and the architecture of buildings along the cycling routes.

Walkability
SVI is particularly useful in walkability studies because it allows virtually walking down street segments to assess how conducive they are to walking (Steinmetz-Wood, El-Geneidy, & Ross, 2020), and it may provide information that is not found in other commonly used data sources in this domain (Yencha, 2019;Biagi, Brovelli, & Stucchi, 2020).
The recurring topic is using SVI to quantify and assess how walkable are streets in a study area (Blečić, Cecchini, & Trunfio, 2018;Nagata et al., 2020;Bartzokas-Tsiompras, Tampouraki, & Photis, 2020). For example, in a wide-ranging study set in New York City, which arguably spans multiple categories, Miranda et al. (2020) analyse 7.7 million images to understand safe pedestrian access, and the role of the architectural style of buildings on the pedestrian's walking experience.
Walkability having multi-fold meaning leads to that a wide range of physical aspects are being investigated in the studies, such as the distribution of pedestrian sheds and walkways (Zhou & Xu, 2020;Cao, Liu, Li, Wang, & Qin, 2018), quality of sidewalks and their accessibility (Plascak et al., 2019;Weld et al., 2019), enclosure of street canyons (Li, Santi, Courtney, Verma, & Ratti, 2018), pedestrian crossings (Steinmetz-Wood, Velauthapillai, O'Brien, & Ross, 2019), traffic mirrors and streetlights (Hanibuchi, Nakaya, & Inoue, 2019), and aesthetics (e.g. flowers and garbage) (Christman, Wilson-Genderson, Heid, & Pruchno, 2019). Many of these aspects are analysed in combination. For example, Zhou, He, Cai, Wang, and Su (2019) present a quantified composite index for walkability (Visual Walkability Index) based on four indicators calculated from the segmented SVI, e.g. visual crowdedness and amount of obstacles. They implement the work in Shenzhen, calculating the index for several thousands of sites. The results suggest the great heterogeneity of visual walkability across the city.
Most of the work provides understanding built environment characteristics associated with walkability, though there is also research on providing route recommendations Wakamiya et al., 2019), and some research projects include collecting walking data to verify the actual movement (Shatu & Yigitcanlar, 2018).
A documented downside of SVI in walkability studies is that most commercial imagery has been recorded from a platform that is higher than the typical pedestrian view (Steinmetz-Wood et al., 2019).

Socio-economic studies
Studies on the interaction of social and economic factors have taken advantage of SVI as well. Several examples are given.
The distinctive study of  analyses text identified in imagery. The detection of text in SVI, especially of storefronts, is not an uncommon occurrence in studies; e.g. Hong (2020) analyses the diversity of languages in Seoul thanks to SVI, and there is another example given earlier in Section 6.1. However, their study largely focuses on detecting the typeface in SVI and indicates its association with the amenities, e.g. night clubs tend to have decorative typefaces. The study has a socioeconomic aspect, suggesting that the typeface can be used as a proxy to infer economic and demographic status in urban regions, i.e. the prevalence of a certain typeface in an area is correlated with household income.  use GSV to investigate the shade provision of street trees in Boston, and relate their socio-economic aspects. Among other results, including ethnic group and education, the study suggests differences among age groups, indicating that there is a positive correlation between the percentage of senior citizens at a location and the amount of shade provided by street trees.

Real estate
SVI has been proven valuable in capturing information in the domain of real estate, primarily in valuation. Considering the intricacy of real estate valuation and numerous factors driving prices, studies that use insights extracted from SVI have done mostly to supplement traditionally used data, e.g. proximity to amenities (Hanibuchi et al., 2018), increasing the accuracy of the predictions and/or offering additional insights since SVI offers a peek into the appearance and visual characteristics of the surroundings of a property, something that is not available in other datasets. For example, Johnson, Tidwell, and Villupuram (2019) utilise GSV data for analysing and quantifying curb appeal of residential properties in Denver. Their study suggests that curb appeal may add economically significant value to a house (7-14%), and it enables replication elsewhere by releasing code and data. Law et al. (2019) is another example of a study where GSV imagery is used in combination with other data (e.g. housing attributes) to predict house prices. The traditionally used housing attributes, such as location accessibility, explain the majority of the variance of house price, but augmenting the models with imagery increases their performance. However, the contribution of imagery is nevertheless still dwarfed by conventionally used attributes such as floor area and age, which remain the main drivers of the price. Further, researchers cite the difficulty of quantifying the visual appearance of real estate and geographical differences, which applies to most other studies in this domain.
Other identified studies that extract features from street view images for property value assessment are presented in the following publications: Bin, Gardiner, Li, and Liu (2020), , Law, Seresinhe, Shen, and Gutierrez-Roig (2020), Zhang and Dong (2018), Zhao, Liu, Kuang, Chen, and Yang (2018), Ye, Xie, Fang, Jiang, and Wang (2019), Fu et al. (2019) and . The last two mentioned studies are interesting to highlight because they include extracting an above-average number of characteristics from imagery, spanning greenery and urban morphology.
Another topic in this domain is gentrification. Considering that gentrification results in visible changes to the building stock, Ilic, Sawada, and Zarzelli (2019) have looked into the usability of deep learning and GSV into mapping and understanding the process. Their study focuses on inferring positive changes in the appearance of properties across a time period and mapping their concentration across a city, demonstrating that it is possible to indicate where and when gentrification processes are occurring, at a reliable level of accuracy and at a fine spatial resolution. For a related study see the publication of Lin and Yang (2019). Bochkarev and Smirnov (2019) develop the automated detection of advertisements and signage on building facades for the purpose of detecting illegal instances in St. Petersburg, and propose a monitoring system for local authorities. On a broader scope, such work could also be used to infer advertising density or economic activity, which has been investigated by Ye, Wang, Kita, Xie, and Cai (2019).
SVI has been capitalised on by Connealy (2020) for understanding trends in food retail, e.g. detecting spatial clusters of food retailers. Their multi-pronged study focuses also on the health and socio-economic aspects (e.g. understanding the association of the prevalence of specific stores with income and health data), and suggests that the work can be applied also for quality assurance in the domain of spatial data infrastructure. However, the work appears to involve substantial manual work, inhibiting large-scale applications.
Gobster, Hadavi, Rigolon, and Stewart (2020) provide a policy assessment of vacant land reuse strategies, by examining fine-scale residential landscape change of vacant lots that have been sold to residents. Their assessment method, which combines SVI and aerial imagery, includes 20 different aspects of land cover and condition, applying them to vacant lots one year before and after purchase. The study supports such policies as it indicates improved signs of condition and care of lots after purchase. In a subsequent study,  expand this research line and provide a framework for longitudinal monitoring of vacant lot programs using SVI.
Finally, in the realm of real estate, there are valuation studies that do not use computer vision techniques to extract a set of insights, but they rather use SVI to manually supplement missing data or verify existing data of properties (Tanaś, Trojanek, & Trojanek, 2019).

Urban perception
SVI has enabled characterising street spaces from a human perspective at a large scale. Thus, it has been used in a significant number of urban perception studies (Gong, Ma, Kan, & Qi, 2019;. Many of these studies are focused on less tangible and less measurable aspects, such as inferring the urban function, vibrancy, and appearance, which is mostly in contrast with the research presented hitherto, and might be subjective (Zhang, Ye, Zeng, & Chiaradia, 2019;Wang et al., 2019;Alhasoun & Gonzalez, 2019). The central theme is measuring the perceived quality of streetscape (Li & Long, 2019;Wu, Peng, Ma, Li, & Rao, 2020;Ye, Zeng, Shen, Zhang, & Lu, 2019), and researchers have been using SVI to measure urban perceptual attributes such as safety and wealth, vibrancy, comfort, and attitude towards greenery (Min, Mei, Liu, Wang, & Jiang, 2020;Yao et al., 2019;Wang et al., 2019;Fu & Song, 2020). Because of the nature of the research in this domain, studies often involve human surveys Ruggeri, Harvey, & Bosselmann, 2018), and they may involve additional data such as audio clips (Verma, Jana, & Ramamritham, 2020).
The purposes of such research are largely meant to inform urban planning and design (Shen et al., 2018), but many studies also have more specific applications of understanding the perception of spaces, e. g. for analysing physical activity , influence of the built form on the human physiological response (Gorgul, Chen, Wu, & Guo, 2019), predicting crime Fu, Chen, & Lu, 2018;Oliveira & Hsu, 2018), understanding colour tendency (Kato & Matsukawa, 2019), understanding symmetries of urban blocks (Samiei et al., 2018), and identifying commercial hotspots and popularity of locations . This thematic category differentiates itself from the previous ones also by the concoction of features that is extracted. For example, to assess the general visual quality of the urban space, Tang and Long (2019) examine SVI to infer the variation of the streetscape, while  measure motorisation.
This line of research reasserts the importance of SVI over other urban data thanks to its unprecedented opportunities. Furthermore, there are also related studies in which SVI had a secondary purpose. For example, researchers have worked on recognising urban functions and quality of spaces from other data such as social media data, and SVI was used to either validate or augment the results Bernetti et al., 2020;Zeng et al., 2019). Further examples of perception studies relying on SVI include using it for analysing spaces over a period of time to understand visual changes and encroachment (Varghese, Gubbi, Ramaswamy, & Balamuralidhar, 2019), and quantifying the perception of traditional buildings . Finally, Yoshimura, He, Hack, Nagakura, and Ratti (2020) investigate how spatial layout is associated with spatial comprehension. GSV is used in a survey in which participants were tasked with guessing the location of particular street-level images.

Other
Finally, in this section we include an assorted collection of a few studies that are sufficiently distinct to not belong to any of the above categories.
Mayer and Bechthold (2019) conduct a life cycle assessment study of buildings. SVI is proven useful to extract the fenestration of buildings, which is required for such studies. Not far from this topic, von Platten et al. (2020) utilise SVI to recognise building characteristics (e.g. façade insulation) that are required for estimating the energy retrofitting potential. Ganji, Minet, Weichenthal, and Hatzopoulou (2020) develop a model for air quality prediction, based on built environment characteristics extracted mostly from satellite imagery, however, SVI is interestingly used to measure building heights and used as one as the predictors of air quality.

General observations
The melange of applications described in Section 6 reasserts the versatility and multiplicity of SVI, and the scope of applications is expected to grow. The geographical coverage and massive amounts of SVI have enabled an unprecedented opportunity to extract insights from the built environment that were previously not available or difficult to derive from other forms of urban data. The main drivers of the rapid increase of using SVI in urban studies in the past years were the increased automation, growth of computing power, increased coverage of SVI data, and the utilisation of deep learning techniques. Deep learning is now used routinely in studies and it turbocharged the extraction of features and segmentation of images, which are essential for many studies presented in Section 6.
The review reveals that many studies have used SVI in conjunction with other datasets, such as social media, aerial/satellite imagery and more traditional geospatial datasets, complementing them and providing additional insight. In some of the studies, using SVI is not essential, but it has been taken advantage of for validation purposes or for improving the performance of predictions.
It was challenging to delineate the intertwined landscape of applications of SVI and segment papers into meaningful categories, and there are papers that cut across multiple domains, but such entanglement serves as a testament to the multidisciplinarity of this research topic.
In many instances in which there is an overlap between SVI and other forms of data, it remains unknown whether SVI in such cases might be sufficient alone, or what is the performance in relation to other sources of data. An exception is a study of Mayer and Bechthold (2019), which uses multiple input datasets to understand the environmental impact of a building, such as building information obtained from a housing survey, and discuss the contribution of GSV. However, their study is very limited, focusing on only one sample.
During our review, we have also noticed that the size of each thematic category is not proportional to innovation. For example, while urban greenery studies are plentiful, a large number of papers are largely replications or offer minor incremental advancements. For that reason, we expect that we have captured also almost all applications featured in papers published before the temporal scope of our systematic review.
Computer vision is not in the focus of this paper, as it is focused on understanding the trends and application of SVI in urban analytics and GIS. That said, it is an inseparable component because much of the developments can be thanked to advances in computer vision and computing capacity. Here there are two key aspects we have noticed. First, while the vast majority of relevant work takes advantage of computer vision techniques to process the massive amount of imagery, a relevant finding is that there are papers using SVI without applying any AI (e.g. (Aklıbaşında, 2019) in greenery). Usually, these occur in studies that require extracting possibly subjective insights, such as the perceived safety and condition of neighbourhoods (Mayne, Pellissier, & Kershaw, 2019;Plascak et al., 2020). Second, we noticed that many papers do not go much into technical details, which inhibits replication.
On that note, it is important to discuss the open science aspect. The lack of sharing of the developments (e.g. code, trained models) also inhibits reproducibility and replication in other geographical areas (e.g. our map in Fig. 8 gives a hint of unexplored locations) or in the same locations in future.
Considering the temporal aspect, it is important to note that longitudinal studies are very rare. The main reason for such gap is that most street view services, including GSV as the most popular one, do not allow retrieving historical imagery through API. There are studies that examine imagery from different periods (Connealy, 2020;Najafizadeh & Froehlich, 2018;Cândido, Steinmetz-Wood, Morency, & Kestens, 2018;Goel et al., 2018), but they collect the data manually from the web interface of GSV (which includes historical imagery) or through other means, rather than through the API (except the possibility that the imagery was collected through an API over a long period of time and archived, which may not be allowed by the service).
A perennial concern is licensing, as studies use imagery from commercial services with restrictive licences to generate new data, which might be in conflict with the terms and conditions of such services, and such topic is not discussed frequently. Recent papers indicate that SVI providers are gradually continuing to restrict access (Fang et al., 2020;Nguyen et al., 2019), which might catalyse the development of crowdsourced SVI, a source that may alleviate such issues.

Issues
In this section, we note common issues and challenges that the reviewed studies reveal and are generic, rather than focused on the limitations of a specific service.
Quality of images Despite the presumed quality assurance mechanisms that mapping services have in place, considering the large number of images, environmental conditions and geographical coverage, the quality of images is inevitably at least a bit heterogeneous. Researchers cite some specific issues that occurdark images and poor lighting conditions, images that turn out not to be outdoor (e.g. tunnels and shops), blurriness, and heterogeneous weather Law et al., 2019;Miranda et al., 2020;Lauko et al., 2020).
Obstructions Objects that are in the focus of studies often tend to be obstructed in imagery, and researchers frequently cite this issue, e.g. passing cars and people (Novack et al., 2020;Bin et al., 2020;Najafizadeh & Froehlich, 2018). The vegetation seems to be the major hindrance, frequently obscuring buildings and other objects. But vegetation in imagery appears to be both a blessing and a curse, ason the other handone of the most common applications of analysing SVI is related to vegetation and greenery (Section 6.2). Another hindrance is the range of imagery, as large objects such as buildings tend to entirely obscure the space behind them (Fig. 1 illustrates this aspect and hints at the advantage satellite imagery has over SVI in this particular point). Hence in some locations, the range of acquisition might be limited.
Coverage SVI services tend to have geographically dense coverage, but uneven coverage seems to be another major issue, across multiple scales. For example, user-contributed services such as Mapillary often lag behind commercial services in terms of completeness of roads. Commercial counterparts are not perfect either: while GSV has made great strides in the past decade and has reached an impressive level of coverage, it isunlike its siblings in Google Maps such as satellite imagery and map datastill not available in about half of countries worldwide, while in territories where it is available, smaller towns and rural areas might not be always included. Such omission entails that studies are tailored to cities (Szczepańska & Pietrzyk, 2020). Furthermore, in certain towns only major roads are acquired, leaving large extents uncharted.
Heterogeneous availability will inevitably result in heterogeneous mapping, which is a key downside for supporting the creation and maintenance of spatial data infrastructures. For research on coverage of street view providers, the reader is referred to the related studies (Quinn & León, 2019;Mahabir et al., 2020;Fry, Mooney, Rodríguez, Caiaffa, & Lovasi, 2020;. Update frequency Besides the geographical coverage, the temporal coverage, i.e. frequency of update, seems to be a common issue as well (Miranda et al., 2020;Helbich et al., 2020). In certain areas, imagery is collected infrequently, often being outdated and not providing sufficient frequency to carry out a study presenting the current status, updates to analyses, and enable temporal analyses (e.g. change detection). This issue is compounded by the aforementioned observation that services in principle do not enable querying historical imagery. It might also happen that different parts of the same city have been imaged in different periods, causing inconsistencies.
The time period of the collection of the imagery has also been cited as an issue. First, the capture of the imagery may not match the desired study period or it may mismatch with the period of other datasets used in a study. Second, the time during the year when the data was collected may be an issue per se and lead to bias (Larkin & Hystad, 2018). For example, in a study on understanding the relationship between greenery and physical activity, Helbich et al. (2020) expose that commercial services offer imagery that was captured during winter months, which might not be appropriate for certain analyses (e.g. those that require measuring the level of greenery). As a solution, they collect their own imagery, in one of the rare instances we have encountered (Fig. 5).
Non-panoramic images Many applications focus on understanding the built environment alongside the roads, e.g. frontage of buildings (Fig. 2). Such a perspective has been facilitated by panoramic images that mostly commercial services offer. However, when it comes to crowdsourced SVI, only a fraction of imagery is panoramic, as most of it has been acquired by dashcams inside vehicles pointing towards the direction of driving (Fig. 3). Having only non-panoramic imagery in an area significantly diminishes insights and subsequently prevents applications that require imagery including street-side profile. This limitation could be one of the key reasons why volunteered SVI services are still not predominantly used in urban studies (Fig. 5).

Research opportunities
We maintain that there is a plenty of further research opportunities in this area. The application of SVI appears to be saturated in some topics such as analysing vegetation. Nevertheless, we postulate that there are further research opportunities even in these domains. For example, as greening initiatives and urban farming around the world are multiplying (Palliwal, Song, Tan, & Biljecki, 2021;Wu & Biljecki, 2021), it would be worthwhile to explore using SVI for monitoring greenery in buildings and other forms of green efforts in cities.
In the spatial data infrastructure department (Section 6.1), SVI offers further opportunities as many aspects remain uninvestigated. For example, extracting other urban features is foreseen as a certain research direction. A notable research gap is offered by those studies that demonstrate mapping objects and their characteristics, but do not use the extracted data for a particular analysis. Such research directions would also galvanise accompanying topics such as understanding bias in mapping from SVI, and the impact of the quality of images and the propagation of error. Furthermore, it seems that SVI has not been used much as an independent data source for spatial data quality assessment, especially OpenStreetMap. As much as a variety of spatial objects and their characteristics have been extracted from SVI, in the domain of spatial data infrastructure, there is a lack of using them for the purpose of spatial data quality control. The liberal licence of volunteered SVI platforms would allow such uses. However, caution should be exercised as volunteered SVI is in many occasions still inferior to commercial services such as GSV, e.g. it suffers from positional issues that will propagate into inaccurate localisation of detected objects (Krylov & Dahyot, 2019).
As hinted earlier in this section, a small number of studies relies on manual work, rather than artificial intelligence techniques. Automating such studies presents a viable research opportunity, possibly increasing their scope or replication elsewhere, among other advantages. Furthermore, it would be beneficial to apply the latest developments in the machine learning community, which might not have been used much on SVI. In the rare instances we have identified, Kauer, Joglekar, Redi, Aiello, and Quercia (2018), Joglekar et al. (2020), Wijnands, Nice, Thompson, Zhao, and Stevenson (2019) use generative adversarial networks (a technique that can be used for image style transfer). The first two use it for urban beautification, and the third paper is focused on understanding the design of streetscapes in relation to health and wellbeing outcomes. Such techniques are seldom investigated, hence they might offer further research opportunities.
As both commercial and volunteered services are increasingly engaging less common platforms such as scooters, bicycles, and pedestrians for expanding the coverage of imagery in locations that may not be reached by cars, we believe that new applications may surface but also that existing ones will experience enhancements. For example, as such platforms will enable imaging locations previously out of reach, such as tertiary roads, pedestrian zones, walkways and cycling paths, we expect to see an increase of applications such as assessing walkability and bikeability covering infrastructure hitherto not evaluated. Furthermore, with the densifying coverage at the micro-scalesuch as capturing narrow and less prominent roadsthe morphological, architectural, and socio-economic diversity of SVI may be increased. The same goes for the growing availability of indoor data, which have been severely underexplored in research so far. For example, recently, GSV has added indoor imagery of all 114 large food centres in Singapore. The data was collected with a camera system mounted on a wearable backpack (National Environment Agency, 2019). The use of such imagery in research is yet to be uncovered.
Finally, as most studies have been conducted within a single city, a generic research opportunity that arises are scaling these research efforts beyond cities, replicating them in other cities, and conducting comparative studies among multiple cities.

Conclusion
We have provided an extensive review of the use of street-level imagery in urban studies and mapping, through the examination of 250 recently published papers. There are three takeaways we highlight to conclude the paper, which we believe is the most comprehensive one detailing the diverse role of street view imagery in the context of urban analytics and GIS.
First, street view imagery is certainly here to stay. It has been entrenched in studies under the umbrella of urban analytics for a while. As this urban data source gained considerable momentum, and the supporting infrastructure (e.g. services, volume and coverage of data, computer vision techniques) is further developing and strengthening, the number of papers and applications is expected to continue growing in the foreseeable future (Fig. 4). However, access to data should not be taken for granted. Our review reveals that the vast majority of studies relies on commercial services. There is no guarantee that these services will be easily accessible for researchers in the future.
Second, while the majority of recent papers relies on Google Street View, which is further penetrating into new locations, new players in the market 2 and the expansion of volunteered street view imagery may open new horizons and might bring enhancements to the data, such as greater and finer coverage (incl. indoors), reduction of licensing ambiguities, and increase temporal resolution, potentially contributing to the emergence of new use cases (see Section 7.3).
Third, street view imagery offers a source for maintaining spatial data infrastructures (Section 6.1). Besides the clear community and commercial interest, it remains to be observed whether national mapping agencies will adopt it and treat it as a data source akin to their orthodox instances such as aerial imagery and point clouds. Further, in the context of SDI, another area of interest is high-frequency SVI, dramatically increasing the temporal resolution of recording the same locations, based on frequent data collection from platforms plying streets such as taxis, public transport vehicles, and garbage trucks. This idea has been tested recently with a variety of sensors, bringing improvements in urban sensing (Anjomshoaa et al., 2018;O'Keeffe, Anjomshoaa, Strogatz, Santi, & Ratti, 2019;deSouza et al., 2020).
However, it appears that optical imagery is yet to be investigated, and we predicate that it might bring enhancements and novelties for applications such as change detection.