The green view dataset for the capital of Finland, Helsinki

Recent studies have incorporated human perspective methods like making use of street view images and measuring green view in addition to more traditional ways of mapping city greenery [1]. Green view describes the relative amount of green vegetation visible at street level and is often measured with the green view index (GVI), which describes the percentage of green vegetation in a street view image or images of a certain location [2]. The green view dataset of Helsinki was created as part of the master's thesis of Akseli Toikka at the University of Helsinki [3]. We calculated the GVI values for a set of locations on the streets of Helsinki using Google Street View (GSV) 360° panorama images from summer months (May through September) between 2009 and 2017. From the available images, a total of 94 454 matched the selection criteria. These were downloaded using the Google application programming interface (API). We calculated the GVI values from the panoramas based on the spectral characteristics of green vegetation in RGB images. The result was a set of points along the street network with GVI values. By combining the point data with the street network data of the area, we generated a dataset for GVI values along the street centre lines. Streets with GVI points within a threshold distance of 30 meters were given the average of the GVI values of the points. For the streets with no points in the vicinity (∼67%), the land cover data from the area was used to estimate the GVI, as suggested in the thesis [3]. The point and street-wise data are stored in georeferenced tables that can be utilized for further analyses with geographical information systems.


a b s t r a c t
Recent studies have incorporated human perspective methods like making use of street view images and measuring green view in addition to more traditional ways of mapping city greenery [1] . Green view describes the relative amount of green vegetation visible at street level and is often measured with the green view index (GVI), which describes the percentage of green vegetation in a street view image or images of a certain location [2] . The green view dataset of Helsinki was created as part of the master's thesis of Akseli Toikka at the University of Helsinki [3] . We calculated the GVI values for a set of locations on the streets of Helsinki using Google Street View (GSV) 360 °panorama images from summer months (May through September) between 2009 and 2017. From the available images, a total of 94 454 matched the selection criteria. These were downloaded using the Google application programming interface (API). We calculated the GVI values from the panoramas based on the spectral characteristics of green vegetation in RGB images. The result was a set of points along the street network with GVI values. By combining the point data with the street network data of the area, we generated a dataset for GVI values along the street centre lines. Streets with GVI points within a threshold distance of 30 meters were given the average of the GVI values of the points. For the streets with no points in the vicinity ( ∼67%), the land cover data from the area was used to estimate the GVI, as suggested in the thesis [3] . The point and street-wise data are stored in georeferenced tables that can be utilized for further analyses with geographical information systems. ©

Value of the data
• Instead of the traditional overhead view, the GVI dataset presents the human-level aspect of the distribution of city greenery in Helsinki. • The dataset can be beneficial for decision makers and urban planners.
• The dataset can be used in various studies, for example those investigating the effects of city greenery on human physical and mental health. It can also be beneficial when planning more pleasant cycling and pedestrian paths and neighbourhoods. • When combined with other existing city greenery datasets, the green view dataset can help to build a more holistic understanding of the city greenery in Helsinki.

Data
The two datasets in this article describe the GVI of the city of Helsinki, Finland, determined using GSV images. The datasets are in GeoPackage format, which is an open standard format for transferring geospatial information. The dataset "greenery_points.gpkg" contains points, located on the street network of the area, for which the GVI was determined from the GSV images. All the attributes related to the points are listed in Table 1 . The dataset "greenery_roads.gpkg" GVI of the image taken towards the heading 300 °G vi_Mean The mean GVI of the 6 images ( = panorama GVI) contains the line features of the street network dataset [9] , enriched with GVI values calculated from the nearby points from the dataset "greenery_points.gpkg" and from the land cover data [12] . The attributes related to the street geometries are listed in Table 2 . Fig. 1 . shows the format to which the GSV images were downloaded. Fig. 2 . illustrates the method that was used to determine the amount of vegetation in the images. Fig. 3 . visualizes the GVI values of the road network in the dataset "greenery_roads.gpkg".

GVI point data from GSV images (greenery_points.gpkg)
The green view point dataset of Helsinki is stored in a GeoPackage where every row contains information on the amount of vegetation in one GSV panorama. The column descriptions are presented in Table 1 . The dataset contains information on the 94 454 panoramas downloaded from Google API. The coordinate reference system of the file is WGS 84 (EPSG: 4326).

Street-wise GVI data (greenery_roads.gpkg)
The green view network dataset of Helsinki is stored in a GeoPackage where every row represents a single street segment and contains the GVI index. The column descriptions are presented in Table 2 . The coordinate reference system of the file is WGS 84 (EPSG: 4326). Beside green-view related columns, the dataset includes columns that allow routing operations with the  dataset (id, street name, road class, functional class, traffic direction, type of the street segment and binary cycleway column).

GVI point data
The data was acquired and analysed using Python programming language and libraries. The scripts used in the process can be found at the GitHub repositories Helsinki _ GreenView and Helsinki _ GreenNetwork .
The first step of the process was to determine the relevant locations for GSV panoramas. The streets from OpenStreetMap within the Helsinki region (excluding motorways, walkways and bridle paths) were used as the input street network data. The line network was transformed into points with a minimum distance of 20 m, using the script CreatePoints.py. After generating the point data, the Google API was queried to find GSV images near the points. The querying was done with several URL requests by using the script MetadataCollector.py. If a panorama image existed within 50 m of the points, the metadata of that panorama was saved in a text file. The metadata contains panorama ID, month/year in which the image was taken, and the spatial coordinates.
For the analysis, only the images where the foliage is green were relevant. Therefore, only the panoramas that were taken between May and September were downloaded, using the script GSV_image_downloader.py. Every panorama was downloaded in six horizontal images ( Fig. 1 ) with a set of URL requests with the parameters of: "pano" = panorama ID of the image, "size" = 40 0 * 40 0 pixels, "fov" (field of view) = 60 °and "pitch" = 0 °. The clockwise direction of the view (parameter "heading") was either 0 °(North), 60 °, 120 °, 180 °, 240 °or 300 °. The images were saved and named after the panorama ID and heading.
To calculate the GVI values for the relevant locations, each downloaded image needed to be processed. The first step of image processing was to segment the image into homogeneous clusters. This was achieved with the Mean Shift segmentation algorithm [4] with the following parameters: spatial radius = 6 pixels, range radius = 7, and minimum size of the cluster = 40 pixels ( Fig. 2 B). After segmentation, pixel values were scaled to a range between 0 and 1 by dividing the pixel values by 255.
The number of pixels representing vegetation in the segmented image was calculated by using the "excess green" (ExG) vegetation index [5 , 6] and Otsu's threshold method [7] . Two binary images were calculated based on threshold values in light and shadow. Threshold values based on sample street greenery pixels were used to distinguish vegetation in different lighting conditions. An additional two binary images were calculated, one for light and one for shadow conditions, using the threshold values determined in the Treepedia project [8] . The four binary images were combined into one by multiplying the light and shadow images with each other and then summing up the resulting two images. This produced a binary image with vegetation pixels and non-vegetation pixels represented with values one and zero, respectively (Image 2C). Finally, the GVI for the i th image from location p is defined as GV I p,i = Number of vegetation pixel s p,i

Number of pixel s p,i
Using the GVI values of the downloaded images, the GVI for each panorama can be determined. The GVI for a panorama p is defined here as the average of the GVI values of the individual images from that location: The results were compiled in a table by using the script GVI_to_shp.py, which saves all the result files as a georeferenced data file that is ready for further analyses.

Street-wise GVI data
To create the green view network for Helsinki, the GVI values from the georeferenced points were aggregated to the street network by using the Street_view_GVI_to_network.py and stored in the column GSV_GVI. The street network used in this work, the MetropAccess-CyclingNetwork [9] , is based on national Digiroad data [10] and was further modified [11] to model walking and cycling in Helsinki.
For each street segment that had GVI points within 30 m, the GVI value for the segment was defined as the average of those GVI points. Since GSV images are from the car-accessible streets only, this operation produced a GVI index for 37 772 of the 56 904 segments.
The GVI for the segments lacking the street view-based index was calculated with the regional land use data [12] from Helsinki. The data contains polygons indicating the areas with trees in the city. In the thesis [3] , it was shown that the street segment-wise GVI values were strongly correlated with the amount of tree cover in the nearby area. Therefore, for the street segments without GVI points in their vicinity, their GVI values were estimated as the fraction of the tree cover in their neighborhood: LU _ GV I s = Total area of tree cover polygons within the bu f f e r s To tal area of the bu f f e r s where buffer s is the street segment s buffered with a 30 m area. The values are scaled to a range between 0-100% by the operation, consistent with the street view-based GVI values. These complementary LU_GVI values were calculated using the script Land_use_GVI_to_network.py and stored in the column LU_GVI. Finally, a combined GVI column (Comb_GVI) was created, where the street view-based GVI was applied (if available) for the segment, otherwise the value of the land-use based GVI is used. The resulting green view street network was saved to a file. Fig. 3 shows the GVI variation in the streets of Helsinki.