Sensitivity of measuring the urban form and greenery using street-level imagery: A comparative study of approaches and visual perspectives

Street View Imagery (SVI) is crucial in estimating indicators such as Sky View Factor (SVF) and Green View Index (GVI), but (1) approaches and terminology differ across fields such as planning, transportation and climate, potentially causing inconsistencies; (2) it is unknown whether the regularly used panoramic imagery is actually essential for such tasks, or we can use only a portion of the imagery, simplifying the process; and (3) we do not know if non-panoramic (single-frame) photos typical in crowdsourced platforms can serve the same purposes as panoramic ones from services such as Google Street View and Baidu Maps for their limited perspectives. This study is the first to examine comprehensively the built form metrics, the influence of different practices on computing them across multiple fields, and the usability of normal photos (from consumer cameras). We overview approaches and run experiments on 70 million images in 5 cities to analyse the impact of a multitude of variants of SVI on characterising the physical environment and mapping street canyons: a few panoramic approaches (e.g. fisheye) and 96 scenarios of perspective imagery with variable directions, fields of view, and aspect ratios mirroring diverse photos from smartphones and dashcams. We demonstrate that (1) disparate panoramic approaches give different but mostly comparable results in computing the same metric (e.g. from R=0.82 for Green View to R=0.98 for Sky View metrics); and (2) often (e.g. when using a front-facing ultrawide camera), single-frame images can derive results comparable to commercial panoramic counterparts. This finding may simplify typical processes of using panoramic data and also unlock the value of billions of crowdsourced images, which are often overlooked, and can benefit scores of locations worldwide not yet covered by commercial services. Further, when aggregated for city-scale analyses, the results correspond closely.


Introduction
Largely thanks to panoramas provided by commercial sources such as Google Street View and Baidu Maps, street view imagery (SVI) rose to prominence in urban and environmental studies, introducing efficient, large-scale, and convenient means to sense the built environment and streetscapes Deng et al., 2021;Guan et al., 2022;Seiferling et al., 2017;Qiu et al., 2022;Helbich et al., 2021;Kang et al., 2021b;Liang et al., 2023). Among many applications, it has been monumental for characterising the urban form and measuring the amount of vegetation, serving a variety of disciplines that depend on understanding the physical environment quantitatively.
A good deal of approaches quantifying urban morphology and street greenery were possible before the advent of SVI with other and well established types of data such as vector geospatial datasets and satellite imagery (Miao et al., 2020;Gong et al., 2018;Li et al., Index (GVI), and Building View Factor (BVF), dominate the multidisciplinary literature (Gong et al., 2018;Lu et al., 2023). The general tenor of the approaches is straightforward -the proportion of an image that is detected to represent a particular urban form element (e.g. greenery, sky) is considered as a proxy of the exposure to it at that particular (ground-level) location. With the large-scale availability of street-level imagery and readily available computing resources, the approach is often scaled using tens of thousands of photos across a city to conduct urban scale studies at a high spatial resolution. These simple but powerful metrics have accelerated a great number of studies to understand their interaction with urban phenomena such as urban heat island effect, comfort, housing prices, mental health, and mobility choices (a comprehensive overview will be given in Section 2).
Despite such studies now being almost routinely conducted around the world, there are two trends and factors that warrant investigation, which are expected to become even more important in the future.
First -the proliferation of such studies exposed diverging approaches, and there is a lack of overview of the different instances used and their comparison. For example, researchers use substantially different perspectives (e.g. a panorama or fisheye image) and various parameters to calculate seemingly the same metrics, and there is a lack of clarity in terminology and measures (e.g. Green View Index and Green View Factor both appear in the literature). While SVI continues to be a staple of numerous studies, researchers tend to follow certain approaches without motivating their choice and discussing their implications, uncertainty, and reliability.
Second -studies predominantly rely on commercial platforms that offer panoramic imagery, but crowdsourced (citizen-led) efforts of collecting street-level images and platforms to store them (e.g. Mapillary, KartaView) have emerged as well, gaining attention for the same research purposes, similarly as other urban data such as climate measurements (Meier et al., 2017). While offering advantages such as flexibility and countering restrictions of commercial services, their notable shortcoming is the field of view and coverage -most volunteers use non-panoramic cameras, e.g. dashcams and smartphones mounted in cars, offering a limited (single-frame) view. It remains unknown and not systematically investigated whether such imagery can rival the well established panoramic imagery that gives a much broader overview of the streetscape. Moreover, this uncertainty is compounded by the large variety of consumer equipment used in practice, e.g. cameras have different fields of view and aspect ratios.
Conflating the two seemingly disconnected aspects, Fig. 1 illustrates the background and motivation for this research. Commercial services offer panoramic imagery (360 degrees horizontally and 180 degrees vertically) enabling a broad perspective and extensive survey of the streetscape and the built environment in the immediate vicinity ( Fig. 1A and B). The panorama is often transformed to another viewe.g. a set of perspective images based on a various parameters (Fig. 1C) or fisheye directed to the zenith (Fig. 1D). Images are segmented using computer vision techniques (Fig. 1C, D, and G) to compute urban form indexes such as the Sky View Factor and Green View Index based on the proportion of each class in the image ( Fig. 1 bottom  right), and sometimes aggregated. These metrics support numerous application domains, but the difference among various approaches to derive them has not been investigated. On the other hand, due to the consumer equipment and heterogeneous approaches used (Fig. 1E), the vast majority of crowdsourced imagery has only a single-frame (limited) and perspective view (Fig. 1F, G) both horizontally and vertically, and in most cases facing forward the direction of the road, offering a limited understanding of the built environment (Fig. 1H) without a panoramic view offered by commercial platforms. It is not known extensively whether perspective (limited, non-panoramic) single images could be sufficient for the same tasks panoramic SVI owes its success in the research community. Finally, not all perspective imagery in crowdsourced services is front-facing and following the direction of driving, but it may include also lateral perspectives (Fig. 1I), and many others we investigate in this paper, of which usability and purpose remain unexplored.
In this paper, for the first time, we build a structured overview of approaches and a mosaic of indicators found in research. We investigate systematically and comprehensively, with experiments, what are their relationships, uncertainty, and performance; and how restricted perspectives in the growing volume of crowdsourced datasets collected by citizens affect the reliability of analyses typically conducted on panoramic imagery, which are paramount for urban research and planning.
The broader significance of this work is that its findings may lead to the standardisation and awareness of different approaches commonly used in research nowadays, and to giving recognition to crowdsourced street-level imagery and understanding its potential across many domains. Commercial services such as Google Street View and Baidu Maps, while with impressive volume of data, are not available everywhere, and this research may contribute to motivating using crowdsourced counterparts of this data to fill data gaps around the world, and lead to developing best practices (e.g. determining the optimal direction of imaging). Further, giving adequate attention to the theory and foundations, the paper doubles as an overview of these metrics for readers who are not acquainted with this topic and the workflows that underpin numerous studies in multiple application domains.
The outcome of this research is important also for researchers who use panoramic (commercial) imagery, not only to understand the interaction among different approaches to make use of panoramas, but also because if limited views turn out to be sufficient, it may mean that researchers now use more data than necessary. That is, it may not be essential to obtain and process the entire panorama, lowering costs of research as the use cases tend to be data-and power-hungry. Further, due to the increasing barriers, commercial sources should not be taken for granted, so researchers in the future may have to switch to crowdsourced imagery.

Street view imagery
Commercial services such as Google Street View, Baidu Maps, and Tencent offer billions of panoramic images of numerous urban areas around the world, enabling immersive insights in the streetscape and rivalling oblique and satellite imagery traditionally used in many research domains (Helbich et al., 2021;Gaw et al., 2022;Luo et al., 2022a). Companies provide standardised approaches to collect data (multiple cameras mounted on cars), resulting in relatively homogeneous data characteristics around the world.
The rise of Volunteered Geographic Information (VGI) (Feng et al., 2022;Naghavi et al., 2022;So and Duarte, 2020;Yan et al., 2020),   Examples of crowdsourced images acquired from cars at the same location by different contributors, having different directions, fields of view, and aspect ratios. The increasingly heterogeneous image data on streetscapes necessitates investigation of the influence of myriads of parameters used. While having some advantages, in both cases, their insight into the streetscape tends to be limited in comparison with panoramic imagery typically found in commercial data. Source of data: Mapillary.
where citizens are producers of geospatial data that can be used freely, gave impetus to also volunteered street-level imagery and development of services such as Mapillary and KartaView, which have gained attention and now host a large volume of data from all over the world (Juhász and Hochmair, 2016;Leon and Quinn, 2019;Zhang et al., 2021b;Mahabir et al., 2020;Ma et al., 2019). VGI efforts have been contributed by companies as well (Anderson et al., 2019;Sarkar and Anderson, 2022), which is also the case for street-level imagery (Huang et al., 2019). Unlike commercial counterparts, the collection of these datasets is not standardised -anyone, anywhere, can collect and share sequential images using any kind of camera and mode of transportation (Hou and Biljecki, 2022;Luo et al., 2022b), which results in diverse imagery (Fig. 2). While there are different approaches and while panoramic imagery is supported, most of the images collected in practice are perspective (non-panoramic, single) images from dashcams and smartphones mounted inside cars, resulting in heterogeneous characteristics. Now that billions of images are available for free and without restrictions, they have been used for some of the same research purposes as commercial panoramic imagery (Karasov et al., 2018;d'Andrimont et al., 2018;Stowell et al., 2020;Lumnitz et al., 2021;Fan et al., 2021;Ding et al., 2021;Raviscioni et al., 2022;Bianconi et al., 2022;Inoue et al., 2022b;Nievas et al., 2022;Esch et al., 2022;Yap et al., 2022a;Yao et al., 2022a;León-Sánchez et al., 2022). Besides accessibility, their advantage is that it is often the only SVI data source at a location, e.g. in places where commercial services do not have any coverage yet (notwithstanding the significant accomplishment of commercial platforms offering data for about half of the world's jurisdictions, there remains an extensive portion of the globe they are yet to cover (Cinnamon and Gaffney, 2021)).
Despite such advantages and the several papers cited in the previous paragraph, user-generated street-level imagery remains rarely used in research, and the number of studies relying on it is dwarfed by those using Google Street View and Baidu Maps . Ostensibly, their disadvantage is the field of view limited by low-cost consumer equipment (Fig. 2), in a stark contrast from professional panoramic imaging systems developed by commercial providers, thus researchers may have no motivation to use them if an easily available commercial source of panoramic imagery is available in the same area (Ali-bey et al., 2022). It is obvious that perspective photos (Fig. 2) may have shortcomings when sensing elements such as buildings, greenery, and sky, possibly being limited measuring the built environment. But is that really the case? Is panoramic imagery truly essential in such studies? In our work we investigate whether such limited view is actually sufficient to reliably sense commonly measured environmental elements of the streetscape for which researchers overwhelmingly rely on panoramas. Along with that, we investigate also whether the value of panoramic imagery is unparallelled or it has been overestimated for frequently employed analyses, and single-frame imagery from crowdsourced platforms such as Mapillary has been overlooked undeservedly.

Characterising common elements: sky, buildings, and greenery
Literature review reveals that the most common urban form elements that are characterised using SVI are buildings (i.e. built form), greenery (or trees, vegetation), and sky (i.e. open canyon space). Thus, we focus on them. Because of the similarities of approaches and logic, and because many studies examine more than one of these elements in conjunction (Fu et al., 2019;Zhou et al., 2022a;Basu and Sevtsuk, 2022;Kawshalya et al., 2022), they are covered together. The explanations in this section are accompanied by Fig. 3 (the bottom portion of the image is related to the methodology of this study and it will be introduced in Section 3).
In general, studies transform the data into a fisheye image pointing up or into multiple perspective images that form a 360 panorama (but often not a vertical panorama, with the top and bottom portions trimmed) (Seiferling et al., 2017;Yao et al., 2021). The former approach (fisheye) has been influenced by the traditional way to compute the Sky View Factor -using cameras with fisheye lenses pointing to the zenith and computing the proportion of the sky hemisphere visible from a ground-level point Gong et al., 2018;Miao et al., 2020;Lan et al., 2021). The latter one (set of perspective images forming a panorama) is done to mitigate distortions and facilitate labelling and processing with computer vision approaches Beaucamp et al., 2022). It has been used mostly for parameterising greenery and buildings. For example, a plethora of studies has computed the Green View Index (GVI), which indicates the degree to which a person standing in a certain position can view vegetation (Yang et al., 2009;. Looking at Fig. 3, these general practices appear to make sensea fisheye image pointing to the zenith gives justice to the complete parameterisation of sky and open space. On the other hand, it will not fully capture vegetation, as some street-side greenery (e.g. grass) will be left out, thus, panoramas without a full overview of the sky but with the horizon in their focal point may be more suitable for such purpose. The same goes for studies that are focused on mapping ground-level features such as sidewalks (Kang et al., 2021a;Hosseini et al., 2022;Ning et al., 2022c).
However, there are many studies that do not follow the same rationale. For example, there are those that characterise the proportion of buildings with fisheye imagery (Gong et al., 2018). It is rarely explained why that is the case. It is assumed that researchers do so to avoid including multiple approaches and to keep the derived metrics in the same context, e.g. when computing both sky and greenery, they focus on either of the two approaches.
The first issue in this domain are terminological disparities. There are no standardised names. In fact, in some instances there are no particular terms used at all -especially in those in which environmental elements of the streetscapes that are analysed are many (Wu et al., 2020b). Further, researchers tend to use different terms for the same metric, e.g. the Green View Index is also known as Visible Green Index , an issue that in general pervades lines of research that parameterise the urban form Xu and Gao, 2022).
In this paper, influenced by the two most prominent metrics -SVF and GVI, and following the names of metrics in some established publications (e.g. Building View Factor in Gong et al., 2018), we define that metrics should suffix elements with View Factor if they are computed with fisheye imagery pointing to the zenith, while those that are computed from horizontal panoramas should use the term View Index, which suggest how much object pedestrians observe (see Fig. 3 for reference). Now that the terminology has been clarified, the application of each of these metrics is overviewed.
The SVF has a long history in domains ranging from satellite positioning to urban climate studies as it has a direct effect on the amount of satellite signals and solar radiation received at that point (Chapman et al., 2002;Chen et al., 2022d;Svensson, 2004;Demuzere et al., 2019;Wang et al., 2018;Ignatius et al., 2022). It has been also used for predicting ground shade provided by trees Sun et al., 2021a), explaining the urban heat island effect (Martin et al., 2022), informing urban planning interventions (Ye et al., 2018), forecasting dengue , explaining health-related factors and physical activity (Lu et al., 2019;Yao et al., 2022b), and understanding pedestrian volume . The counterpart of SVF computed from panoramas -Sky View Index -has been used in a variety of studies Jing et al., 2021;Liao et al., 2021;Zhou et al., 2022c;Gao et al., 2022) (we purposefully do not abbreviate this metric to avoid the conflict with SVI that is reserved for street view imagery). Both metrics are encountered with alternative names such as level of open sky , percentage of sky (Verma et al., 2020), and proportion of sky (Yin and Wang, 2016;Gong et al., 2019a).
The SVF can be computed in many ways, such as 3D city models, point clouds, building footprints, and most accurately -with field surveys with cameras equipped with fisheye lenses. But SVI, mimicking this approach by transforming the cylindrical panoramas into azimuthal fisheye images, provides a convenient and scalable approach retaining approximately the same level of quality (Dirksen et al., 2019;Chen et al., 2020b;Heo et al., 2020;Yuan and Chen, 2011;Seiferling et al., 2017;Middel et al., 2017Middel et al., , 2018Gong et al., 2018;Biljecki and Chow, 2022;Zeng et al., 2018;He et al., 2022b;Lipson et al., 2022), and it provides some advantages over other datasets. For example, as 3D city models often contain only buildings (Labetski et al., 2022;Peters et al., 2022;Wysocki et al., 2022), SVI has gained popularity over them thanks to the inclusion of all urban features, most importantly vegetation (Gong et al., 2018).
Moving on to greenery and buildings, both factors and indexes have been used extensively, e.g. Green View Factor (Peng et al., 2022;Hua et al., 2022), Green View Index Wu et al., 2019;Chen et al., 2020c), Building View Factor (Gong et al., 2018), and Building View Index (Chen et al., 2020a). Greenery is commonly computed conflating trees and grass in a scene (Koo et al., 2022;Yu et al., 2021;Zhang and Hu, 2022), but there have been more specific instances such as Tree View Index (Qiu et al., 2022) and Tree View Factor (Gong et al., 2018), and differentiating different classes of greenery in cities (Sun et al., 2021b). These metrics have been so common and successful that they have transcended the street-level, e.g. some have been adopted at other platforms such as window-level in a building (Li et al., 2022c).  The proportion of buildings and vegetation, either alone or in combination, in a street scene have been used for estimating the real estate premium (Yang et al., 2021b;Chen et al., 2020a), monitoring street-side changes , understanding rideshare accessibility , crime occurrence (He et al., 2022a), perception , and air pollution (Wu et al., 2020a).
These dimensionless metrics are not relevant only in urban canyons and densely built areas, but rather for a variety of urban configurations. Thus, in our work, we investigate diverse urban configurations around the world.

Related work
In general, little is known about the impact of parameters of imagery and the topic of errors, uncertainty, and quality in SVI. To the extent of our knowledge, the only work related to ours is the one by Kim et al. (2021). It has a dual purpose: understanding the influence of the sampling rate of SVI (e.g. the difference between retrieving imagery every 20 or 100 m along a road) and the influence of considering different combinations of perspective images derived from a panorama (e.g. perspective image directed at the right side of the car vs a panorama with a 360 view). The latter part, referred to as directional setting in their paper, is more related to ours. The study primarily computes indexes such as SVF for each perspective and its correlation with the one derived from the panorama at the same location. Their results suggest that: (i) information is lost when limited views are used; (ii) the results obtained from an image directed at one lateral side (e.g. camera pointing left; cf. Fig. 1I) can be significantly different from those aiming to the other side; and (iii) the correlation (i.e. measure for error or information loss) depends on both the index computed and the view. For example, having only the front view results in correlations between 0.652 and 0.96 for different indexes, while focusing on one index (sky view) computed from different directions reveals that the correlation varies between 0.822 (right view) and 0.961 (back view).
We bring several novelties that have not been focus in research yet, and several substantial advancements with respect to the state of the art. First, we systematically and comprehensively investigate a large number of scenarios occurring in practice and their relationships (e.g. panoramas obtained from imagery with different aspect ratios, and panoramas obtained from a different number of images, to name a few among many). Second, their work is concentrated on the topic of splitting imagery from panoramas (resulting in fixed fields of view in both horizontal and vertical dimensions), rather than considering also the context of crowdsourced imagery in which the field of view (FOV) is not standard due to myriads of equipment and the fact that the horizontal and vertical FOV are rarely equal (resulting in a square image). In our work, we zero in on the topic of field of view for the first time, and consider multiple options, reflecting different equipment used in practice. We will demonstrate with experiments that this aspect is quite important. Third, their work focuses on a single and limited study area (a town in California in USA), while we include multiple cities around the world to increase diversity of urban forms. Our decision is affirmed by the fact that we obtained different results by city. In general, SVI studies involving more than one city are scant, thus, our work contributes as a cross-city comparative study in understanding the urban form and intra-city distribution of the aforementioned metrics. Fourth, their results include only one metric (correlation), while our work adds another metric (mean absolute error) that may be more suitable for interpreting the results.

Methodology
In a nutshell, we generate imagery according to different scenarios and approaches, and compare the values of the generated metrics based on a custom-built and highly-adaptable workflow.

Imagery and perspectives
We consider common practices, and regard the following types of perspectives (Figs. 3 and 4): • Fisheye (zenith). These are used to compute the Building/Green/ Sky View Factors. • Panoramas, which are used to compute the indexes: -4-image panoramas that are obtained from four perspective images extracted according to cardinal directions (0, 90, 180, and 270 degrees). We noticed in literature that they use different aspect ratios. So we compute three variants of these 4-image panoramas, using aspect ratios 1/1 (square, most common), 4/3, and 16/9. -5-image panoramas -these are obtained from the 4-images listed above with an additional one pointing upwards. They are computed only using square images (1/1 aspect ratio). One may wonder why simply not focus on this kind of panorama, which gives an entire overview from a location and as it should be always possible to compute with commercial services. The reasons depend on the use casee.g. in some studies, panoramas tend to mimic the view of pedestrians, so overhead views are not included.
• Perspective images (limited views), which represent those taken from consumer cameras and which are found in crowdsourced services, but are also used to construct panoramas when using commercial services (previous point). We regard 8 different directions: the same 4 as above (0, 90, 180, and 270 degrees) and 4 additional (diagonal) ones between (45, 135, 235, and 315 degrees), which are not common in practice, except in few publications (Chu et al., 2022) and some crowdsourced circumstances where e.g. dashcams are misaligned, but may be interesting to investigate as they combine longitudinal and lateral imagery (e.g. 45 -one that includes a bit of the front view and glimpse on the right). In parallel, to consider the heterogeneous equipment used in practice, we introduce multiple fields of view -70, 80, and 90 to approximate typical ranges found in wide-angle cameras in consumer smartphones, and 120 to include those that are typical for dashcams and ultrawide-angle cameras in smartphones. Another motivation for the different FOVs is that nowadays, smartphones tend to have multiple cameras (e.g. wideand ultrawide-angle), thus, given the choice, one may wonder which of these may be better to use for urban sensing. In our exploration of the metadata of a crowdsourced platform, 1 we find that most commonly images have aspect ratios of 4/3 and 16/9, but we include also 1/1 for consistency. Permuting all these parameters, in total, we have 96 such images (Fig. 4), which we analyse separately. Because of the limited perspective, to distinguish them from panoramas, we call the indexes computed from such single perspectives as limited indexes.

Investigated metrics and statistics
We investigate the following five relationships (Fig. 3), and for notation we assign a letter to each. For each relationship, we compute one or two of the following metrics: Pearson correlation coefficient (R) and Mean Absolute Error (MAE). For all relationships, we compute the correlation coefficient to indicate the statistical relationship between two variables. However, we compute the MAE only for those metrics that are comparable, as MAE is used as measure of errors between observations expressing the same phenomenon (e.g. we do not do that when comparing indexes to factors, as it is meaningless to talk about errors in that sense).
Statistic A -Factors and indexes. We start by investigating what is the relationship between the factors (obtained from fisheye) with the indexes obtained from panoramas (both those computed from 4 and 5 images). Comparing these two key means to measure the urban form (Section 2.2) will shed light on their correspondence and potential differences.
Statistic B -Indexes from different variants of panoramas. While panoramas are used frequently, there are multiple ways to generate them: comprising 4 and 5 images. Thus, we analyse their difference across the three indexes. Besides R, for this relationship, we compute also MAE.
Statistic C -Indexes and limited indexes. This is the central part of the work, analysing the value of single (limited) views in comparison with panoramas, helping to answer whether crowdsourced SVI, which tends to have non-panoramic imagery is sufficient for sensing the urban form. Using this relationship, we analyse a large number of perspectives with different fields of view, directions, and aspect ratios. Thus, it helps us to understand and compare a variety of questions and scenarios, such as whether having a wide-angle camera pointing longitudinally is better than a ultrawide-angle camera pointing laterally; or if we had the choice of collecting data with a typical smartphone camera, in which direction it would be most useful to point it when measuring e.g. the Green View Index. This part is also of relevance to researchers using panoramic imagery, as it will help understanding does it make sense to use the entire panorama if one of its parts is sufficient, i.e. can panoramic imagery be reduced to a limited perspective, achieving similar results but decreasing the complexity of studies.

Statistic D -Factors and limited indexes.
Similarly to C, this statistic also caters to understanding the usability of perspective cameras, but in comparison to metrics derived from fisheye imagery. That is, the results obtained from single-frame views are compared to those derived from fisheye imagery.
Statistic E -Different aspect ratios. This relationship regards the different aspect ratios found in practice, and it has two related parts. First, it focuses on 4-image panoramas, it compares the different aspect ratios when generating them, e.g. 4/3 vs 16/9. Second, it seeks to understand the role of consumer cameras with different aspect ratios (e.g. smartphones and dashcams, both used in crowdsourced imagery, tend to have different aspect ratios; see Fig. 2).

Study area, data, and image segmentation
Departing from the vast majority of studies focusing on a single city (Section 2.2), we choose five cities, with diverse morphological, geographical, and planning characteristics around the world: Amsterdam (the Netherlands), Hong Kong (China), Kampala (Uganda), San F. Biljecki et al. Fig. 4. The single image scenarios we simulated, as permutations of common aspect ratios, fields of view, and directions. We compute the three indexes of the urban form (which we dub as limited indexes to distinguish them from those generated from 360 imagery) for each scenario. Not all scenarios are shown for space constraints. The figure also expounds the metrics relevant for the limited perspectives. Source of the imagery: Google Street View. Francisco (USA), and Singapore. The importance of measuring the urban form across diverse areas has been attested to identifying that all of these cities have been subject of past studies on the urban form (Mukisa et al., 2019;Rafiee et al., 2016;Liang et al., 2019;Sevtsuk et al., 2021;Wong and Kardinal Jusuf, 2010;Hosseini et al., 2022;Yang et al., 2021a;Gong et al., 2019b;Chen and Biljecki, 2023). Fig. 5 illustrates the diversity of our study areas, and considering such will reveal whether the relationships we investigate will be consistent across various morphologies. The data has been obtained from Google Street View. We have obtained imagery at 691,750 locations. For each location, we have obtained a fisheye image, a few panoramas (5-image, and multiple 4-image panoramas with different aspect ratios), and the 96 limited perspectives as illustrated in Fig. 4, resulting in nearly 70 million images (and more than 200 million metrics we compute from them). For the hemispherical transformation of the panorama into the fisheye photograph looking vertically (upwards with the zenith in the centre), we use the upper half of a panorama, and reproject it. The factors have been computed using approaches found in the state of the art (Miao et al., 2020;Xia et al., 2021;Gong et al., 2018).
For the segmentation, we used Cityscapes 2 as the training dataset, which is a large-scale database that focuses on semantic understanding of urban street scenes. It provides semantic, instance-wise, and dense pixel annotations for 19 classes, including the ones in focus (c.f. Fig. 1). By comparing different semantic segmentation models, we adopted the state-of-the-art model, DeepLabV3, as the Mean Intersection over Union (mIoU) can exceed 80% in this task (Chen et al., 2018).
The statistical distributions (Fig. 5) reveal the diversity and data distribution in our study. For example, one might have expected that Hong Kong, well known for its urban jungle dotted by the world's largest number of skyscrapers, would yield a distribution with higher building indexes/factors and lower sky view indexes/factors than other cities. However, much of the imagery is taken all around the city, including suburban villages, country parks and nature reserves, adding to the diversity of settings in our research.

Results
Because of the large number of results but limited space, this section features the most important results across the five statistics in focus. Most of the attention is given to limited perspectives, the central focus of this paper, which are dealt with by statistics C and D. The scatter plots contain a random subset of the observations to avoid clutter.

Factors and indexes (Statistic A)
Fig. 6 portrays the relationship between the indexes (computed from 4-and 5-image panoramas) as correlation coefficients and scatter plots. There are a few important observations. The correlation coefficients, while in general strong, are different for the three urban elements. The weakest (albeit still strong) relationship is for greenery. This could be explained by the fact that the Green View Factor (computed from fisheye) misses ground-level vegetation such as grass, which are included in Green View Indexes (both 4-and 5-image panoramas). On the other hand, 4-image panoramas have a limited peek into overhead greenery, which are well captured in fisheye imagery. Therefore, a key observation is that the different approaches may capture different aspects of urban greenery.
As expected, all elements in 4-image panoramas have a somewhat weaker correlation than 5-image panoramas (their relationship is investigated in the next section as a separate statistic). As it is the case with tall greenery, 4-image panoramas in high-rise settings may not be able to capture the full vertical extent of buildings (hints of these cases are visible in Fig. 3 and some example images in Fig. 5), which affects the computation of the Sky View Index as well, resulting in over-and under-estimations, respectively. A 5-image panorama exhibits a very strong correlation with its counterparts (factors) derived from fisheye imagery; in fact, it suggests that in many cases there is not much difference between using a panoramic image and fisheye, but not always and especially for greenery the correspondence may vary and be less strong.
Finally, in Fig. 7, we reveal major discrepancies between the two. The two perspectives yield almost the same results across three indexes in cases of low-rise architecture and in homogeneous streetscapes. Further -unsurprising but worth mentioning -the discrepancies tend to be quite small or none where there are no particular features present (e.g. the GVI and GVF are both zero in areas without any or with very little greenery, thus, their mismatch is also around zero; see some settings in Fig. 5). The major discrepancies where the fisheye perspective provides substantially different results from panoramas are often locations with high-rise buildings and grasslands. That is, the fisheye perspective will capture the full vertical extent of a building (and any other tall feature such as overhead vegetation), while panoramas will provide a complete view of the ground-level where the grass is (which in the fisheye perspective is below the horizon and cut off).

Indexes from different variants of panoramas (Statistic B)
There is a strong relationship between the indexes derived from the 4-and 5-image panoramas (Fig. 8). If the goal of a study is to capture the entire streetscape including the overhead view, clearly a 5-image panorama will give more meaningful results. However, the results indicate that when one uses 4-image panoramas, the results are close to the 5-image approach.

Indexes and limited indexes (Statistic C)
In this statistic, we investigate the use of perspective images commonly found in crowdsourced imagery. To simplify the interpretation of the large number of combinations, we consider the most common ones in practice: images with 4/3 aspect ratio taken with wide and ultrawide cameras (FOVs of 70, 80, 90, and 120 degrees) and the ultrawide instance (FOV of 120 degrees) that has the aspect ratio of 16/9 to account for contributors that use dashcams and similar (denoted with an asterisk in the visuals).   In Figs. 9 and 10 we give the results for all these combinations. The most important result is that in many scenarios, the measured indexes are very strongly correlated with metrics obtained from panoramas, suggesting that images with limited perspectives, which are considered to hamper their adoption in the research community, may actually be useful for inferring the urban form and come close to the metrics estimated from panoramic imagery. The strongest correlation is exhibited by ultrawide-angle longitudinal recordings, e.g. images with a FOV of 120 degrees taken in the direction of driving have correlation coefficients of 0.942, 0.950, and 0.927 with those obtained from panoramas, for the BVI, GVI, and SVI, respectively. Conveniently, this appears to be the most common provenance found in practice in crowdsourced street-level imagery.
The results are different for all the considered characteristics of equipment. As it was the case in the previous statistics, the results also differ for each element of the urban form, but they are in line with each other across the characteristics such as field of view and direction. In the continuation, we dissect them.
Starting with the direction -there is a significant influence in which direction the camera is pointing, confirming the findings of Kim et al. (2021). In the cases of all elements, the images tend to give more truthful results if they point front (0) or back (180) rather than to the sides (90 and 270). Moreover, while front and back views tend to achieve almost the same results, the left or right direction (90 vs 270 degrees) of the side views has a notable difference on the results. The diagonal views (45, 135, 235, and 315 degrees) do not bring much value, as they have only a marginal improvement over lateral   views, and cannot compete with front and back views. Thus, starting to collect such imagery in practice would not be helpful, but at the same time such imagery, which is sometimes found in practice, would not compromise estimating these metrics.
The FOV has a notable influence on the results (see Fig. 10). However, while it is not a surprise that images obtained with ultrawide cameras will give better results than those with a narrower angle, the results suggest that it is not as important as the direction of view. For F. Biljecki et al. Fig . 10. The performance of using perspective (single) street view imagery for inferring the urban form depends on the camera type used and direction of the view (Statistic C). The asterisk indicates the images with the wider aspect ratio, which is frequent for ultrawide angle cameras. example, a wide-angle camera (e.g. FOV = 80) pointing longitudinally (direction = 0 or 180) will give more truthful results (R = 0.89, for BVI) than an ultrawide-angle camera (e.g. FOV = 120) directed to the side (e.g. direction = 90) (R = 0.86), despite the much wider view offered by the latter.
As another outlook on the results, we computed the proportion of imagery that has an index within 0.05 of the one computed from a 4-image panorama, as the standard way to compute such metrics. For FOV = 120 and direction 0 (one of the most common combination of parameters found in practice), for building, greenery, and sky, in 78%, 62%, and 72% of cases, respectively, such images yield indexes that are within 0.05 in value from those derived from 4-image panoramas. Such conclusion uncovers the tremendous potential of freely available crowdsourced imagery collected with low-cost cameras. Further, the results suggest that in most studies researchers may be using much more data than they need. Instead of using several images from stitched panoramas in Google Street View, they may reduce data sourcing to single images, dramatically simplifying workflows and reducing computational complexity while maintaining comparable accuracy.
The results so far have been analysed for all five cities together. For further analysis, the discrepancies are grouped by city and urban form (Fig. 11). The middle plot illustrates the difference between the limited indexes from the panoramic indexes for each city (the top part of the figure shows the metrics together for reference). It suggests that there are some differences among the cities. While indexes related to buildings and greenery are comparable among cities, they are a bit more pronounced for the Sky View Index, with each city having its set of results. The differences are not substantial, thus, we conclude that our findings apply worldwide to most if not all cities.
Next, we focus on decomposing the results by the urban form, focusing on the sky, which exhibited larger differences than the other two elements in the previous paragraph. We bin the locations into 10 quantiles by the value of the SVF of each, and analyse the errors for each quantile (bottom plot in Fig. 11). The results are analysed in isolation for multiple FOVs and directions, and exhibit different behaviour. It can be concluded that the urban form has some influence on the results, and the exact statistics will depend for a particular city. Further, the results among the cities may not only be due to the different urban form, but also the different approaches to collect data in them. For example, in some cities, Google Street View collects data from parks and hinterlands (Chen and Biljecki, 2023), which have a significantly different array of indexes and factors, but such acquisition may not be conducted in some other cities, biasing the results.
Finally, in Fig. 12, we give examples for very accurate and least accurate estimations, to give an understanding of scenarios that are suitable and less suitable for imagery with limited views. In these examples, we focus on forward-facing perspective views (direction = 0) at the field of view of 90 degrees and aspect ratio 4:3. Analysing the differences in the indexes computed from such perspective and panoramic imagery, we uncover that many of the major discrepancies in which the perspective images are not representative of the surrounding (panoramic) environment are those collected at the end of a street where a particular element (building, greenery) dominates the image. Further examples include images collected on underpasses (specifically for the Sky View Index), bridges over water (i.e. having an open sky around but not in front), hilly areas, and situations with large objects on the sides that are missed in front-facing images such as sideways trees outside the view (for similar scenarios see also the top left and middle left settings in the image as examples for greenery and building, respectively). In contrast, locations where a single image is representative of the entire surrounding are typically homogeneous and symmetric landscapes or those that are diverse but where by chance the metrics correspond in both perspectives (e.g. in the figure, the middle right example indicates a heterogeneous and asymmetric setting where the GVI is 0.254 both when computed from a single and a panoramic image).

Factors and limited indexes (Statistic D)
Similarly as the previous statistic, we analyse the performance of limited perspectives with the data that is usually obtained from commercial services, this time with the factors, which are derived from fisheye images. Fig. 13 suggests that albeit the correlations are weaker than it is the case with panoramas (Statistic C), they are still very strong for certain combinations, such as an ultrawide angle frontfacing image. We hope that these results will motivate future studies to consider crowdsourced imagery in research that is traditionally engaging commercial (panoramic) data.

Different aspect ratios (Statistic E)
Finally, we examine the influence of the different aspect ratios. For limited perspectives, a hint of this influence is given in the previous two statistics (see Figs. 9 and 13), in which a certain scenario (FOV = 120) is compared between 4/3 and 16/9 perspectives, which are found across different cameras used in practice, and the results show an almost negligible influence. The influence of the aspect ratio is elaborated further here.
We start with the different aspect ratios (4/3 and 16/9, see Fig. 4 for their comparison) of perspective images. For BVI and GVI, in all cases of directions, the correlation coefficient is above 0.99 and the errors are small (MAE of up to 0.04), and in the shadow of the previous characteristics that have a more profound influence. Even in the least favourable combination (direction = 90 and FOV = 70) the correlation coefficient is still very strong (R = 0.975), not necessitating F. Biljecki et al. Fig. 11. The results (Statistic C in this case) are influenced not only by the different element of the urban form (i.e. building), but also city and its urban morphology. further research on this aspect that appears to have a consistently small influence.
Next, we investigate the relationship between regular 4-image panoramas (computed from perspective images with an aspect ratio of 1/1) with those computed with narrower aspect ratios such as 4/3 and 16/9, which are found in some instances. The correlations for 4/3 panoramas are 0.994, 0.994, and 0.990, for building, green, and sky view indexes, respectively. The results for 16/9 variants are: 0.978, 0.977, 0.961. Unless there is a strong reason (such as perception studies that would mimic the human vision, e.g. see Walker et al., 1990) to extract narrow panoramas, it is better to follow usual practices with 1/1 aspect ratios.

Aggregating the measurements: urban scale analysis of the sky view
The point-level results are often aggregated to understand the spatial distribution of the urban form in cities and the relationships with other variables (e.g. socio-economic indicators) (Demuzere et al., 2019;Gong et al., 2019a;Liao et al., 2021;Liu et al., 2021;Sun et al., 2021a,b;Verma et al., 2020;Wang et al., 2018;Wu et al., 2020b;Zhang et al., 2019) (Fig. 1 bottom right), e.g. at the scales of blocks (Yao et al., 2021), subdistricts (Lu et al., 2019), districts (Seiferling et al., 2017), and city-scale . Researchers also aggregate data at regular grids Gong et al., 2019a;Dirksen et al., 2019;Verma et al., 2020;Ito and Biljecki, 2021). Such spatial aggregations are predominantly focused on the mean value of an indicator at all imaged points in a zone (e.g. cell of a grid), but some researchers compute also the variation therein, such as standard deviation .
As another angle of interpreting the results, we aggregated them in a city-scale analysis and compared the results to gauge the performance of single-frame images for this purpose. In this analysis, we focused on Singapore (a morphologically diverse city; see Fig. 5), the sky view, and the perspective image that is commonly found in crowdsourced imagery and when generating panoramas (direction of zero and FOV of 90 degrees), and compared the results to values obtained from fisheye and panorama (Statistics C and D; see Sections 4.3 and 4.4).
We aggregated the results according to a regular grid system (H3, a geospatial indexing system that partitions the world into hexagonal cells 3 ) that is composed of hexagons, and it has been adopted in various disciplines (Bousquin, 2021;Woźniak and Szymański, 2021;Zhao et al., 2023;Liang et al., 2023). Each cell has the area of about 0.10 sq. km. and it contains 51 panoramas on average.
To put the results in context and enable straightforward comparison, for each approach, we compute the standard score (Z-score) of each cell -the number of standard deviations a value is above or below the mean value at the city scale. The resulting two maps (Fig. 14) are very similar to each other. The correlations between the two are very strong (0.951), with the one to a 4-image panorama even stronger -0.981. Computations for building and greenery exhibit a similarly strong relationship. This finding indicates that non-panoramic SVI, typically found in platforms curating user-generated data, may be used for some use cases without a noticeable loss in accuracy. It also suggests that when the results are spatially aggregated, they tend to be more accurate, diminishing the image-level uncertainty of different perspectives (e.g. the R for that scenario at the image-level, i.e. Statistic D, is noticeably smaller -0.84; see Fig. 13).

Main insights and recommendations
The previous section has presented a comprehensive series of novel and insightful results, which will be elaborated further and untangled in this section.
When it comes to existing practices using panoramic imagery, it seems that there is a general correspondence in the results among the different approaches, with some exceptions in specific settings (cf. Fig. 12), and with varying magnitudes (e.g. for green view, the correlation is less strong than for sky view).
Because in certain cases single images come very close in deriving the same results, instead of fetching and processing numerous images, perhaps the process could be reduced considerably by deriving only the one with the most favourable combination of the direction and field of view, which may simplify data collection and reduce the computational complexity of their processing.
This result is also favourable for crowdsourced street-level imagery in which non-panoramic imagery still accounts for the vast majority of imagery found in mainstream platforms such as Mapillary and KartaView. Do we therefore suggest to replace commercial data sources with crowdsourced alternatives in future studies? Yes and no. It is certainly good news that in the majority of cases, a limited view may be sufficient to infer the urban form, potentially unlocking the value of billions of user-generated free images, some of which are available in locations where commercial services provide no coverage (León-Sánchez et al., 2022). But issues remain, and further investigations are  necessary. For example, quality of volunteered SVI is heterogeneous, primarily in terms of completeness and image quality (Quinn and León, 2019;Hou and Biljecki, 2022), aspects that are out of scope of this paper.
The results also depend on the urban form element that is measured. Estimating the sky view seems to be most sensitive one (cf. Fig. 10), thus caution should be exercised in morphologies with street canyons and researchers should be aware of the shortcomings of certain configurations.
As it is the case with other error propagation studies, the interpretation of the results much depends on what the data is used for. For example, an error of 0.05 in the computation of the Sky View Factor may not be a big deal in the general characterisation of the urban form, but may be influential in some other use cases. Experiments for specific use cases (e.g. using the Green View Index for walkability studies) need to be conducted to understand sensitivity to errors and establish application-specific data requirements.

Influence of the driving direction
The cities we focused on have different traffic rules. That means that in bidirectional traffic, a city that drives on the left hand side of the road, e.g. Singapore, the direction 90 degrees (on the right side of the car), may capture a wider spatial extent than the one that drives on the right (e.g. NYC), in which case the view in the same direction is closer to the street-side (see the top part of Fig. 15). We investigate this matter and group the results by this aspect -it appears that there is a notable influence of this aspect (see the middle of Fig. 15), which we raise for the first time in the SVI community, and which may warrant further investigations considering the growing volume of comparative studies involving multiple cities, which may involve contrasting traffic rules. The analysis focuses on the performance of the perspective images versus panoramas (Statistic C). For longitudinal views (directions of 0 and 180), there is no significant difference in the results, in contrast with the situation for lateral directions (90 and 270). However, in all cases, left-hand traffic cities have better results, suggesting that the influence may be biased by the physical difference among cities (e.g. see the previously discussed Fig. 11).
The different elements are affected differently -building and greenery exhibit mostly similar results, but they are different from sky. In the case of the Sky View Index, in almost all directions there is no difference for left-and right-hand traffic, including for the direction of the camera on the left (270), while a notable difference is found in the opposite side direction (90). The illustration and the examples from crowdsourced imagery in the figure affirm these results by demonstrating that the same direction from different sides of the road does not reveal sky and buildings entirely, and it is heavily influenced by the physical configuration of the streetscape. Further aspects that influence such results may be the share of one-way or bidirectional roads and their width.
This aspect remains inconclusive and open for debate, but necessary to bring in this paper as it was not raised in literature so far and it may be relevant for studies involving multiple cities. It is a conjecture that will require further analysis, as it is not possible to isolate the effect of the side of the traffic with the influence of the urban form of the city and other influencing factors. For example, detailed geospatial analysis examining the influence and contribution of the e.g. the number of lanes of a road, directionality, and size of street/building setbacks and buffers, and accounting for the urban form (e.g. comparing only clusters of similar streets in different cities) will reveal the interaction of these aspects precisely.

Limitations and directions for future work
A part of the limitations of this study are generally caused by SVI and their analysis, and this work is driven by practices of typical studies in the field, mirroring their limitations. SVI is a boon and a bane for measuring the urban form metrics -its advantage is that it can be used to calculate a holistic (or 'all-inclusive') SVF, and at the same time its disadvantage is that it does not differentiate between the contributions of various types of features in the built environment (e.g. the SVF taking only buildings into account). Thus, it may be used in conjunction with other spatial data for a more insightful understanding in the SVF (e.g. a 3D building model may help to remove the contribution of buildings to isolate the SVF provided solely by vegetation) . Further, imagery is often available only on roads (Chen and Biljecki, 2023), thus, the estimation of the indexes is possible only within their confines.
Following the practices of uncertainty propagation studies in GI-Science, we have relied on simulating perspective views from panoramas to establish a controlled environment, which enabled us to set a variety of parameters and isolate the uncertainty induced by a specific aspect such as the limited field of view. In this work, it might have been possible to compare panoramas from a service such as Google Street View with crowdsourced perspective images from Mapillary or KartaView in the same locations. However, that may not be a viable direction as in practice, images from different platforms will mismatch in many aspects, such as exact location of the panorama, time of acquisition, and quality of imagery. All these external factors will impact the results of the estimations, and the differences between two data sources will be a composite of all kinds of aspects and errors, e.g. different camera heights (Yan and Huang, 2022;Aikoh et al., 2023), heterogeneous completeness (Hou and Biljecki, 2022;Quinn and León, 2019;Seto and Nishimura, 2022), seasonality (Chiang et al., 2023;Han et al., 2023), and further inconsistencies such as varying light  conditions and attribute errors (Zheng and Amemiya, 2023), making it difficult to isolate the influence of particular aspects such as field of view and direction, which we largely focused on this paper. Thus, both a strength and limitation of this work is that we use simulated data and focus on the aspect of camera properties and acquisition practices, and their influence in inferring the urban form. Future research will account for other aspects such as completeness and positional resolution of imagery, but likewise, this might be possible only with processed and controlled data.
As in all studies employing SVI, the segmentation that we used is not perfect (see Fig. 1 for examples of small typical misclassifications in the images), thus, the results are influenced to some degree by such uncertainty, which cannot be quantified due to cultural, architectural, and geographical differences (Inoue et al., 2022a). While we employ the same model everywhere, which may help minimise the influence of model errors, it remains unknown how much these imperfections affect the results. Further, we noticed that in some instances, certain houses are blurred for privacy reasons. Some of these are blurred with a soft filter (see an example in the upper sample for San Francisco in Fig. 5, at the right), resembling the sky and morphing the building into it, thus, the model has misclassified that part of the image, an error that propagates to the computation of the indexes. However, this is in line with all studies in the field, which have a degree of uncertainty in the segmentation process, and such errors should not have a major influence on our findings, especially when dealing with millions of images.
When designing the method, we explored a crowdsourced platform to find the most common practices of capturing street-level imagery around the world (e.g. field of view, direction). The parameters we have established (Section 3) account for the vast majority of data, and they are expected to remain the same in the foreseeable future. However, it is not possible to capture all camera parameters and settings that are used in practice.
In our experiments, we did not take into account errors such as significant vertical tilt of a camera. For example, a dashcam may not always be pointing at the horizon. For future work, it would be beneficial to investigate whether a dashcam pointing upwards may give better results by encompassing the vertical extent of urban canyons.
Our work focuses on data from GSV, which is overwhelmingly collected with cars on driveable roads, which is also in line with the content of crowdsourced services such as Mapillary. However, an increasingly evident strength of crowdsourced services are off-road images and other transportation modes, e.g. those collected by pedestrians on sidewalks and in pedestrian zones, and cyclists on dedicated bike lanes, with its importance heralded by recent studies (Chen and Biljecki, 2023). It might be beneficial to take into account such modes, and perform a comparative study between imagery collected on different instances on the same street (e.g. sidewalk vs roadway).
F. Biljecki et al. Finally, in our work, we have focused on probably the most common use of SVI -inferring the urban form and greenery with image segmentation. However, the results may differ for other use cases, such as mapping street furniture and infrastructure (Zhong et al., 2021;Zhou et al., 2022b) and extracting building information (Ning et al., 2022b;Yan and Huang, 2022;Sun et al., 2022). For example, the latter may be significantly more sensitive to the lack of panoramic images because not all buildings are clearly visible from single-frame imagery pointing in the direction of driving .

Conclusion
Many research groups worldwide practising various urban disciplines rely on street view imagery, an emerging but widely available source of geospatial data and valuable means for urban sensing. Much of them use numerous images of streetscapes to infer the urban form and greenery at a high spatial resolution and accuracy. Examining the methodologies employed, we encounter disparate approaches among different research domains computing similar indicators, an aspect that may cause inconsistencies and one that warrants investigations. Further, street-level imagery became an important component of the landscape of crowdsourced geospatial data, as billions of images have been collected and shared by volunteers around the world, potentially competing with commercial sources, which have been used extensively. As the urban data research ecosystem is shifting towards openness in terms of software and data (Yap et al., 2022b;Fleischmann, 2019;Boeing, 2017;Rey, 2019;Liu et al., 2015;Benitez-Paez et al., 2018), it is important to intensify investigating their reliability and give them consideration for environmental and urban studies. Our work represents a significant initial stride towards achieving this goal by suggesting that the limited views of single-frame cameras and the use of consumer equipment may not represent a hindrance, but it recommends further investigations by incorporating other quality aspects that typically burden crowdsourced geographic information, such as heterogeneous completeness.
We think that this research is especially relevant given the increasing restrictions of commercial sources of SVI -urban data researchers may soon be impelled to divorce from their long dependency on Google Street View and Baidu Maps, in favour of crowdsourced imagery released at a free licence and involving citizens. However, that will come at the expense of forgoing data that is acquired in a standardised manner and one that gives an immersive view into streetscapes and the built environment thanks to panoramic imagery with a great regard to high quality, consistent weather conditions, and metadata. Thus, this research is timely to provide an understanding of the usability of limited imagery and set the scene for further research in understanding the usability of crowdsourced street-level imagery obtained from volunteers with consumer devices.
For the first time, this study gives a comprehensive set of insights into the influence of the parameters and practices for computing several urban form indicators such as the Sky View Factor, Green View Index, and Building View Index. There are many takeaways from our study, most importantly across three matters: • Panoramic imagery. While bringing clarity to the metrics and investigating their interaction, a literature review suggests that not all researchers follow the same approaches, which is also aggravated by terminological inconsistencies. We establish that there is a difference in the results based on how the same dataset is used, but in most cases it is minor. However, researchers should be careful when exercising specific use cases, as some of them have associated limitations (e.g. using the fisheye image pointing upwards appears to be the way to capture the open sky in urban canyons, but such perspective may not be suitable for estimating the total amount of greenery as it tends to overlook grasslands). Further, the results suggest that studies may be using more images than necessary, and our findings may help simplify such studies.
• Single imagery. We quantify the performance of using single (limited) images in comparison with traditionally used panoramic imagery, and demonstrate that they can present a reasonable replacement for them, and may satisfy many use cases. In this aspect, we conclude that narrow views are able to give broad perspectives. We hope that this finding will simplify workflows by reducing the volume of imagery used, and even spur the community to consider the hitherto overlooked user-generated images in crowdsourced platforms, and take advantage of their free availability and flexibility, and the fact that they are available in various places where commercial counterparts are not present or have outdated photos. The results indicate that all the investigated camera characteristics play a role but with varying degrees -the direction of the image is more important than the field of view and aspect ratio of the camera. Further, when the data is spatially aggregated (e.g. at district scale), the uncertainty tends to be very low. • External factors. The computation of indicators is somewhat affected by the urban form, city, perspective, and even traffic rules, but the results can be considered to apply globally. Understanding such relationships is complex and it is a viable direction for future work.
As autonomous vehicles, which rely on a variety of sensors such as cameras, become more common, it is possible that we may be able to use the collected data as a source of street-level imagery, and thus, contribute to the growing volume of the data shared openly and their quality.
The broader perspective of our work is the contribution to uncovering the potential of crowdsourced street-level imagery -the visual frontier of volunteered geographic information, and lead to further investigations, such as regarding the heterogeneous quality and completeness of data in crowdsourced data, and the development of benchmarks to identify the most optimal parameters of street view imagery while minimising computational complexity.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability
The authors do not have permission to share data.