Urban

A B S


Introduction
Urban green spaces are crucial both for the well-being of people and for environmental sustainability in cities (Jenks & Jones, 2009;Lee & Maheswaran, 2011;MEA, 2005). Green spaces -the network of parks, forests and other green areas in an urban structure -provide opportunities for physical exercise and recreation, in addition to various other benefits to people (Gómez-Baggethun et al., 2013;Tyrväinen, Mäkinen, & Schipperijn, 2007). Access to and availability of green spaces are related to social and environmental justice (Ngom, Gosselin, & Blais, 2016;Wolch, Byrne, & Newell, 2014) and physical and mental health (Engemann et al., 2019;Tomita et al., 2017;Ward Thompson et al., 2012), even if the linkages are sometimes complex (Anguelovski, Cole, Connolly, & Triguero-Mas, 2018).
While the urban population of the world is growing rapidly (DESA/ UN-WUP, 2018), urban areas need to expand or densify in order to accommodate growing population, which is often carried out at the expense of green spaces (Haaland & van den Bosch, 2015;Zhou & Wang, 2011). Planning sustainable and socially equal cities requires knowledge about how people value and use urban green spaces (Burkhard, Kroll, Nedkov, & Müller, 2012). For example, understanding the importance of urban green spaces for recreation and social interaction is important when carrying out land use planning under urban development pressure (Haaland & van den Bosch, 2015).
Obtaining comprehensive data on the use of and values related to green spaces is challenging. Recreational use and preferences related to urban parks have been studied using questionnaire surveys (Tyrväinen et al., 2007), activity diaries (Mytton, Townsend, Rutter, & Foster, participation geographic information systems (PPGIS) (Brown & Kyttä, 2014;Brown, Schebella, & Weber, 2014;Laatikainen, Tenkanen, Kyttä, & Toivonen, 2015). These approaches provide in-depth understanding about green space use, but are often limited in duration and frequency, because data collection that involves active participation can be timeconsuming for everyone involved.
On the other hand, we witness new digital data sources emerging from different types of crowdsourcing initiatives, citizen science projects and big data feeds. Crowdsourcing and citizen science projects gather information and insights from active contributors via online tools (See et al., 2016), whereas big data refer to overwhelming amounts of diverse information produced constantly by and about people through online networks and different digital sensors (boyd et al., 2012;Kitchin, 2014). For instance, GPS-enabled mobile devices and online platforms hosting geolocated usergenerated content provide detailed information about the whereabouts and activities of people over time in large quantities. Different elements of usergenerated data sets, such as geotags, timestamps, content and user information, provide new possibilities for studying the use of green spaces from different perspectives (Di Minin, Tenkanen, & Toivonen, 2015) (Fig. 1). These data provide new opportunities for understanding how cities function in space and time (Batty, 2013) and how human-nature interactions occur in different environments (Toivonen et al., 2019). There are certainly theoretical, technical and ethical challenges for applying these new data sources, however, even imperfect measures -when used with caution and critical thinking -are better than no consideration of the value of nature in decision making (Daily, Postel, Bawa, & Kaufman, 1997).
The aim of this paper is to discuss what kinds of information about urban green space use and values can be extracted from different types of user-generated geographic information and applied to urban green space planning. We use the concept of user-generated geographic information to capture different kinds of novel digital data sources that involve the interaction of people and location-based technologies. We compare four different types of data for analysing green space use and preferences: social media data, sports tracking data, mobile phone operator data and PPGIS data.
Stemming from the framework proposed by Di Minin et al. (2015) (Fig. 1), we study Helsinki, Finland, as our case area and explore: 1) where the spatial hotspots of green space use are, 2) when people use green spaces, 3) what activities are present in green spaces and 4) who are using green spaces based on available sample data sets. Finally, we compare the different data sets as a source of knowledge about urban green space usage and we discuss the potential and challenges of using user-generated data to inform sustainable spatial planning and green space management in cities.

Characteristics of user-generated geographic information
The data sets used in this study represent various kinds of usergenerated geographic information ranging from passively contributed to actively contributed data (See et al., 2016). Social media data and mobile phone data are originally generated for other purposes than

Fig. 2.
Examples of a) social media data, b) sports tracking data, c) mobile phone data and d) PPGIS data from green spaces in Helsinki, Finland. Social media data and PPGIS data represent the original user-generated points, and the spatial accuracy of these points may vary from exact to coarse. Sports tracking data and mobile phone data are in an aggregated format. In addition to varying spatial units, the attributes of the data sources also differ. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) V. Heikinheimo, et al. Landscape and Urban Planning 201 (2020) 103845 research or volunteered mapping efforts, and thus they fall into the category of passively or indirectly contributed data. In contrast, PPGIS data are generated by users who actively participate in a map-based survey, and the users are aware that the data will be used for planning or research. Sports tracking data, and other similar GPS-based activity tracking data often fall into the category of passively contributed data if acquired from the data provider in large quantities, but can also be gathered as smaller samples from active study participants. While different types of user-generated data sets describe people's whereabouts, activities or preferences to some extent, they differ in technical details and thematic content (Fig. 2). These properties affect the way how empirical analysis can be done and what conclusions made based on each data set. Passively contributed data are often more voluminous, but might lack necessary background information and other details. Actively contributed data often allows the collection of relevant details, but might be limited in duration and extent (Levin, Lechner, & Brown, 2017).
Spatial accuracy is a key aspect when analysing user-generated geographic information. Social media data might be attached to exact coordinates, or more coarse points-of-interests based on place names (Hochmair, Juhász, & Cvetojevic, 2018;Toivonen et al., 2019). Sports tracking data is originally captured as exact GPS points, but often delivered in aggregate format. Allocating mobile phones to locations depend on the antenna network (Ahas, Silm, Järv, Saluveer, Tiru, 2010;Järv, Tenkanen, & Toivonen, 2017). PPGIS data often represents markers that users have placed on a map in a web browser, and the preciseness can depend on the zoom-level and local knowledge of the user (Brown, 2012).

Social media data
Social media refers to web-based services that allow people to create and share content in online communities (McCay-Peet & Quan-Haase, 2017). Flickr and Twitter are the most used social media data sources in environmental studies (Ghermandi & Sinclair, 2019), mostly because of the availability of publicly available data via the platforms' application programming interfaces (APIs). Panoramio, Instagram, Facebook and Foursquare have also been popular data sources for studying human activities and presence in nature. While most environmental studies have used data from a single platform (Ghermandi & Sinclair, 2019), combinations of data sources may produce the best understanding of green space visitors .
There has been a recent rapid increase in using social media for studying people's activities and preferences in geographical research. Location-based social media data are used widely in urban research ranging from urban form and structure to activity practices of people (Crooks et al., 2015;Huang & Wong, 2016;Shelton, Poorthuis, & Zook, 2015;Steiger, Westerholt, Resch, & Zipf, 2015) and increasingly in environmental studies (Ghermandi & Sinclair, 2019;Toivonen et al., 2019). Studies have found that social media usage rates reflect observed visitation rates in popular nature destinations such as national parks Wood, Guerry, Silver, & Lacayo, 2013) and urban parks in large cities (Donahue et al., 2018;Hamstead et al., 2018).

Sports tracking data
Sports tracking applications allow people to trace their physical activities using satellite navigation and mobile devices. Sports tracking applications, such as Strava, Sports Tracker and MapMyFitness track the user's activities by combining information from different sensors, such as GPS, accelerometer and heart rate monitor, and sometimes also manually entered information about the activity (Lendák, 2016).
Sports tracking data may contain information about recreational activities, such as biking and walking, and utilitarian activities, such as commuting to and from work (Oksanen, Bergman, Sainio, & Westerholm, 2015). Previous studies have used GPS-based activity data to study the relationship of physical activity and urban green infrastructure (Vich, Marquet, & Miralles-Guasch, 2019), as well as on-and off-trail use in protected areas (Norman & Pickering, 2017) and urban parks (Korpilo, Virtanen, & Lehvävirta, 2017). Previous studies found similar patterns in sports tracking data and manual cycling counts (Jestico, Nelson, & Winters, 2016;Oksanen et al., 2015), but sports tracking data is also known to be biased towards men and active athletes.

Mobile phone data
Studies using mobile phone data are often based on passive mobile positioning data (as opposed to active tracing of mobile phones), which refers to information that mobile phone operators automatically store in log files . Mobile phone data contain information about the location of mobile devices in the mobile phone operator's network -to which network antenna a mobile phone is connected to . Mobile phone data may contain information about calls, text messages, data usage and network connection attempts. As the network antennae are situated with varying densities, the spatial accuracy of mobile phone data can vary. However, latter can be enhanced through spatial interpolation (Järv et al., 2017).

PPGIS data
Public participation geographic information systems (PPGIS) and participatory geographic information systems (PGIS) refer to approaches that combine participatory methods with geographic information technologies to collect insights from the general public, often in the context of a planning process (Brown & Kyttä, 2014). PPGIS surveys often allow users to mark places, routes and areas on a (web-based) map when answering a set of questions. Responses may be gathered from household sampling groups or volunteer participants (Brown, 2017).

Green spaces in Helsinki
Our sample data cover Helsinki, the capital of Finland. Helsinki is a medium-sized city with a population of 648 042 in 2018 (Statistics Finland, 2019). In addition to the inhabitants, the city is visited by tourists, who also contribute to the pool of user-generated data. In 2017, the city recorded 4.2 million overnight stays in Helsinki, 46% by domestic and 54% by foreign travellers (City of Helsinki, 2018).
Densification of the city structure is one of the major goals in the general plan of Helsinki, and strategic planning of the green space network aims to ensure a multifunctional and connected green infrastructure into the future (Jaakkola, Böhling, Nicklén, & Lämsä, 2016). From a regional perspective, the core of the green space network in Helsinki is the so-called 'green fingers': six sections of green infrastructure that extend from the coast towards the north (Fig. 3).
V. Heikinheimo, et al. Landscape and Urban Planning 201 (2020) 103845 In this study, we used green space polygons from the register of public areas in Helsinki (City of Helsinki, 2019), and we further identified regionally important green spaces using a polygon layer to delineate the green fingers (Jaakkola et al., 2016). We used 250 m x 250 m statistical grid squares that have their centroid inside the green finger polygons in the grid-based comparisons.

Sample data sets
We gathered available user-generated data from various sources and subset all data sets to the extent of Helsinki (Table 1). More details about the data sets and data processing are available in the Supplementary materials. Additional documentation and Python scripts used for producing results in this paper are available online at: https:// github.com/DigitalGeographyLab/some-urbangreens.
Social media data ('posts') were collected from three different platforms: Flickr, Instagram and Twitter. We accessed publicly available geotagged social media data from Flickr API (www.flickr.com/services/api/), Instagram API (www.instagram.com/developer/), and Twitter API (developer.twitter.com). We subset the data for one calendar year -2015 for Flickr and Instagram and 2017 for Twitter -according to data availability, and identified points that spatially intersect with the green spaces. We also calculated the number of social media users in each 250 m × 250 m analysis grid square. See further details about social media data collection and pre-processing in the Supplementary materials (S1-S3).
Sports tracking data from the Strava platform were acquired as an aggregated data set, which contains information about the number of athletes and trips and commuting trips per minute in each road segment (Strava Metro 2016;Tarnanen, Salonen, Willberg, & Toivonen, 2017). We identified road segments that intersected with the green spaces using a spatial overlay. We joined information about number of athletes and trips to the 250 m × 250 m analysis grid by selecting the maximum number of athletes from intersecting road segments for each grid square. See further details about the Strava data user base in in the Supplementary material (S4).
Passively collected mobile phone data from a two-and-a-halfmonth period (28.10.2017-9.1.2018) was acquired from a major Finnish mobile network operator company. The mobile phone data used in this study are based on the number of hourly data use attempts (e.g. browsing the internet on a mobile device or email synchronization) made by the users in the mobile network (Bergroth, 2019). The data set was anonymized by the mobile network operator, and we aggregated data from regular weekdays (Monday to Thursday) into 250 m × 250 m statistical grid squares on an hourly interval prior to further analysis (Bergroth, 2019). Each grid square in the final data set contains information about the relative share of estimated population across the  V. Heikinheimo, et al. Landscape and Urban Planning 201 (2020) 103845 whole region. For green spaces, we considered mobile phone data from grid squares having their centroid inside the green finger polygons. In spatial and temporal comparisons, we used the information from 4 p.m. to 5 p.m., because this time interval received the most activity across green spaces in the mobile phone data set. For more details about the data processing, see Supplementary material (S5) and Bergroth (2019). PPGIS data was acquired from two surveys conducted by the local municipality: the Helsinki 2050 survey from 2013 and the National Urban Park Survey from 2017. The Helsinki 2050 survey was conducted in order to support the development of an upcoming general plan for the City of Helsinki (Kahila-Tani, Broberg, Kyttä, & Tyger, 2016). The survey contained 16 question in total, out of which we considered two questions related to green spaces. The National Urban Park Survey was designed to support the planning of a national urban park in Helsinki -a project that aims to secure cultural and landscape values in cities. We considered all 12 questions that allowed users to add point markers on the map. For both data sets, we only considered PPGIS point markers located in Helsinki, and further only those markers intersecting the green spaces (Table 1). We also calculated the number of PPGIS users in each 250 m × 250 m analysis grid square. Both PPGIS data sets are openly available online in the Helsinki Region Infoshare open data portal (https://hri.fi/en_gb/). See further details about the PPGIS data sets in the Supplementary material (S6).

Spatial and temporal analysis
We first inspected what proportion of each data set was produced in green spaces, in comparison with data across the whole city. In further spatial comparisons we used the 250 m × 250 m grid squares that have their centroid inside the green finger polygons in order to make the other data sets comparable with the mobile phone data.
We identified hotspots as the top quintile (20% of grid squares) of each data set, based on the number of users in the green space grid squares. We compared the spatial hotspots using the Jaccard index, following the approach in (Lehtomäki, Tuominen, Toivonen, & Leinonen, 2015). The Jaccard index measures the similarity between two sets and is calculated as the intersection of two sets divided by their union. Value 1 indicates complete overlap between the two sets, and value 0 means the two sets do not overlap.
We compared the temporal patterns of social media data, sports tracking data and mobile phone data from green spaces using temporal plots that show the relative share of user activity at each time unit. We aggregated the data per hour of the day, weekday and month, when possible, to allow for visual comparison of the temporal patterns.

Content analysis of social media data
We conducted manual content classification for a subset of the social media images from Flickr and Instagram. We selected a sample of social media posts for the content classification based on a spatial intersection between the social media points and a coarse-resolution polygon of the regionally important green spaces ('green fingers') retrieved from a regional land use plan from the Helsinki-Uusimaa Regional Council. See further details about the data pre-processing in the Supplementary material (S2). In total, we manually classified the content of 15 312 Instagram photos and 1843 Flickr photos.
The objectives of the content analysis were to assess data quality by determining how many of the photos located in green spaces actually contained information about green spaces, and to understand what activities from green spaces are portrayed on social media. Advertisements and photos taken inside people's homes were determined as not relevant content. Content of relevant images from green spaces were further labelled if they contained activities (such as jogging or biking) or a landscape (a picture with a visible horizon and/or a wide view of the landscape). One picture could be allocated to several categories.

Language identification
Strava data and PPGIS contained some metadata about the data producers (see Supplementary materials S4 and S6), while social media data and mobile phone data lacked any additional information about who produced the data. We chose to demonstrate how language identification could improve our understanding of the data producers and green space visitors.
We limited language identification to the most extensive social media data set available, namely Instagram. Instagram posts have relatively long caption texts, which improves automatic language identification (Baldwin & Lui, 2010).
We identified the language of all posts by first pre-processing the captions and segmenting them into sentences, as proposed in (Hiippala, Hausmann, Tenkanen, & Toivonen, 2019). We then used a pre-trained model capable of identifying 176 languages (Bojanowski, Grave, Joulin, & Mikolov, 2016) to identify the language of each sentence. We excluded both short sentences (< 7 characters) and those identified with low confidence (< 0.5). We then summarized the language information for each user. If the user had used more than one language, we first excluded English from the list of languages identified, as social media users often post in English in addition to their first language (Hiippala et al., 2019). If more than one language remained after removing English, we assumed that the most frequently used language corresponded to the user's primary or first language. For the final analysis, we only considered languages with more than 10 users in our data set. Finally, we summarized the number of different languages in the 250 m × 250 m analysis grid squares across green spaces.

Spatial patterns of green space use
Where are the spatial hotspots of green space use? The volume and proportion of information from green spaces varied between the data sets ( Table 2). In absolute numbers, Instagram was the most extensive data set across the city and from green spaces, while Flickr had clearly the highest ratio of records (posts) per user inside and outside green spaces (Table 2). PPGIS data sets had the highest proportion of posts located in green spaces, which is logical because the PPGIS questionnaires included direct questions about nature and recreation.
Our sample data sets for sports tracking data and mobile phone data did not contain spatially explicit information about the absolute number of individual users in specific areas. Based on the aggregated data sets across the city, we calculated that the proportion of users in green spaces was 7% in the Strava data and 3% for mobile phone data.
The spatial pattern of users in green spaces was rather different  V. Heikinheimo, et al. Landscape and Urban Planning 201 (2020) 103845 among the data sources (Fig. 4). The jaccard index between data hotspots was below 0.40 among all data sets ( Fig. 4; Table 3), also when testing with different threshold values for the hotspots (see Supplementary materials, tables S1 and S2). The most similar hotspot patterns were between the two PPGIS data sets (J = 0.37). Furthermore, hotspots of the three different social media data sets were the most similar among each other (J = 0.30 between Instagram and Table 3 Jaccard similarity coefficient between hotspots of the different user-generated data sets in green spaces. Value 1 would indicate complete overlap and value 0 no overlap. V. Heikinheimo, et al. Landscape and Urban Planning 201 (2020) 103845 Flickr, J = 0.25 between Instagram and Twitter and J = 0.20 between Flickr and Twitter). Between different types of data, the most similar hotspot distributions were between PPGISPark and Flickr (J = 0.23), mobile phone data and Strava sports tracking application data (J = 0.23) and PPGIS2050 and Instagram (J = 0.20). Overap among the hotspots from different data sets are visualized in the Supplementary materials ( Fig. S1 and S2).

Temporal patterns of green space use
When are people using green spaces? Temporal patterns based on social media data, sports tracking data and mobile phone operator data reveal patterns of leisure time park use and commuting.
Aggregated 24-hour patterns of social media data extracted from green spaces show that most of the users share content in the afternoon and evening (Fig. 5a). Contrastingly, sports tracking data from green spaces show a clear diurnal activity pattern with peaks in the morning and afternoon (Fig. 5b). When looking only at the non-commuter trips ('leisure trips') in the Strava data, there is an activity peak in the late afternoon/early evening (Fig. 5b), similar to the pattern in social media data. Diurnal patterns are clear in the mobile phone data when inspecting individual grid squares in popular recreational and commuting areas, such as Central Park in Helsinki (Fig. 5c). When looking at all green space grid squares together, the share of users at each hour averages at around 4% (Fig. 5c).
Weekday observations were possible to carry out only using social media data and sports tracking data. Data grouped by weekdays show increased park use activity during the weekends both in social media data (Fig. 6a) and leisure trip data from Strava (Fig. 6b).

Activities in green spaces
What are people doing in green spaces? The first step of our social media content analysis identified how much of the data subset from green spaces contained relevant information. For Flickr and Instagram, the majority of data in green spaces was classified as relevant (Flickr:  V. Heikinheimo, et al. Landscape and Urban Planning 201 (2020) 103845 93%, Instagram: 83%), while the rest were classified as not relevant (Flickr: 5%, Instagram: 15%) or not available at the time of data analysis (Flickr: 1%, Instagram: 2%). We found that most of the geotagged Twitter data from green spaces (74%) had been originally posted on Instagram (based on the media links, see Supplementary material S3 for details), and we did not conduct further content analysis on Twitter data. Content analysis of images from regionally important green spaces fell into the main categories of landscape photos and activity photos rather similarly among Instagram and Flickr (Fig. 7). Flickr data contained a higher proportion of miscellaneous photos from green spaces including, for example, close-up images of plants. Mostly the same physical activities, such as cycling, jogging and dog walking, appeared in both Flickr and Instagram, but in different proportions (Fig. 8). Eating and drinking (for example, having a picnic) was also popular in both data sets. The activities observed from the Flickr data rely heavily on individual users as there were only 38 users who had shared photos of activities in the classified data set. For example, one user had shared 196 photos of jogging from a single event. Exact number of users and photos per activity category in the classified sample data can be found in the Supplementary material (Table S3). All in all, the Instagram data set was more diverse in terms of the activities present in the photo content.

Languages in green spaces
Who are visiting urban green spaces, and who are producing the data? Language could be identified for 94% of users who had geotagged social media content to green spaces. Remaining users had posted content whose language could not be reliably identified (e.g. short sentences or low confidence) or did not include any linguistic content (e.g. captions consisting of emojis only). In total, we identified 37 languages used by at least 10 unique users in green spaces (Appendix E). The most frequent languages were Finnish and English, followed by Russian and Swedish. Less frequently used languages included Japanese, German, Italian, Spanish, Estonian and French. A full list of languages and their users is available in the Supplementary materials (Table S4).
When comparing the spatial pattern of linguistic richness ( Fig. 9) with the spatial hotspots in general (Fig. 4), green spaces with more users and touristic attractions are clearly richer in terms of languages used. Linguistic richness decreases in green spaces further away from the city centre. Finnish was the most used language in most green spaces, except for locations that also serve as touristic destinations, such as Töölö Bay.  Table S3. V. Heikinheimo, et al. Landscape and Urban Planning 201 (2020) 103845 5. Discussion

Similarities and differences among the user-generated data sources
Our results indicate that both active and passive user-generated geographic information have the potential to provide new insights about green space use and people's preferences towards green spaces. Fig. 10 aims to summarize the commonalities and differences among social media data, sports tracking data, mobile phone data and PPGIS from the perspective of understanding where, when, why and in what way people use green spaces and who these people are.
The radar chart describes the relative applicability of each data type for answering the different questions. We positioned each data source on the radar chart based on the sample data sets used in this study, and our evaluation of their applicability more broadly. For example, social media data, sports tracking data and mobile phone data provide rather high-quality data regarding the spatial and temporal aspects of green space use, but the amount of detail about activities in green spaces varies between data sources. Fitness for purpose of PPGIS data is very context specific as their information content depend strongly on the study design. PPGIS data often allows high quality data for understanding who, what and why, moderate accuracy about where, but little information about when in comparison to the other data types. Furthermore, the data sets can be used in combination with each other in order to gain a more comprehensive understanding about different questions and to potentially mitigate some of the biases in these data sets. Table 4 draws together main aspects of each data source that should be considered when using these data.
Research and decision making can gain different perspectives about green space use from these data. Social media data depict particularly the patterns of leisure time green space use, being in the parks (What are people posting about, when and where?). Sports tracking data and mobile phone data can also capture the daily rhythm of moving through the parks, including every-day commuting (Where and when are people moving in green spaces?). The PPGIS survey data used in this study focused on green space values, perceiving the parks, and not on actual green space use (Who are valuing specific green spaces and why?). PPGIS data differs from the other more passively produced data sources in general, as the surveys can be designed according to information needs of planners and researchers. When using these data sources for research and planning, there are several questions related to data access, information content and biases to consider (see more details in Table 4). Fig. 9. The map shows the number of used languages across green spaces of Helsinki, based on Instagram data from 2015. Bar graphs show the proportion of different languages in selected grid squares from different kinds of green spaces. Languages with less than 1.5% share per grid square are grouped as "other" in the bar graphs. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) Fig. 10. An illustration of the potential applicability of different user-generated data sets for answering the questions where, when, what, why and who as interpreted by the authors. The further towards the corners of the radar chart, the higher the quality of the data for answering the different questions. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) V. Heikinheimo, et al. Landscape and Urban Planning 201 (2020) 103845 Table 4 Comparison of different user-generated data sources for studying the use of urban green spaces. -WHEN is the daily activity peak in each area? Active athletes and younger male population overrepresented (Jestico et al., 2016;Oksanen et al., 2015).
Mobile phone data Data logged by mobile phone operators -WHY are some green spaces valued more than others? Response bias (Brown, 2017); for example, more educated and enthusiastic citizens might be overrepresented among both randomly sampled and volunteer participants.

Exploring spatial and temporal patterns of green space use
Regarding the spatial dimension, all four data types provide relevant information about urban green spaces. Different data types highlight different hotspots and the spatial overlap between the different data sets is relatively low among the sample data sets based on the jaccard index. Spatial patterns in social media data highlight the popular and meaningful places for leisure time green space use among locals and tourists. Social media data hotspots are mostly concentrated near the city centre and popular outdoor destinations, for example, in the Helsinki Central Park, Töölö Bay, Vanhakaupunki bay and Fortress of Suomenlinna (Figs. 3 and 4). Specific hotspots depend on the platform. The spatial accuracy attached to points-of-interest likely drives the spatial pattern of Instagram and Twitter data.
Other data types are more spread out in comparison to social media data. Sports tracking data provide information on the highest spatial accuracy, revealing the use patterns across the path network based on the users' GPS tracks. Both sports tracking data and mobile phone data have the potential to capture actual routes in green spaces at different times of the day. Mobile phone data are especially fit for analysing population densities in bigger park areas such as the Central Park in Helsinki. It is, however, difficult to analyse the use patterns in smaller parks or at park edges based on mobile phone data due to the limited spatial resolution of the original network-based data. PPGIS data hotspots from the sample data sets reflect the areas from where residents wanted to share insights (Kahila-Tani et al., 2016) but did not necessarily visit. Also the temporal mismatch among some of the data sources might be reflected in the spatial patterns.
Temporal analyses of social media data, sports tracking data and mobile phone data further demonstrate the different characteristics of the data sources and the importance of analysing several data sources together, as each of them reveal different types of use patterns at distinct temporal scales. Sports tracking data and mobile phone data track people's locations continuously, providing information on the diurnal park use at a finer temporal scale, including commuters passing through green spaces. The sports tracking data is limited to physical activities, highlighting the times of active exercise. Social media data, on the other hand, show clearly the leisure time use patterns, highlighting evenings and weekends when people have time to enjoy the park and share their experiences. Despite biases, insights collected from various data sources provide detailed and dynamic views on the spatial and temporal patterns of urban green space use, or dynamics of people in cities in general (Järv et al., 2018).

Analysing activities in green spaces
Understanding what people do in green spaces or why they visit parks calls for more in-depth information beyond locations and timestamps. For example, sports tracking data often include information about the type of activity but are limited to certain sport activities, such as walking, jogging and cycling. Mobile phone data do not generally contain any direct information about people's activities, although comparing location data to other geographical data sets may give hints about what activities are taking place . Textual and visual content in social media data or well-designed questions in PPGIS have been increasingly used for understanding environmental preferences (Brown & Kyttä, 2014;Hausmann et al., 2018).
In the case of Helsinki, content analysis of social media data revealed a wide range of activities in green spaces. While we had no ground-truth data about the activities present in our study area, previous studies have shown that social media content likely captures the most popular activities in green spaces (Hausmann et al., 2018;Heikinheimo et al., 2017). Social media data are particularly useful for revealing emerging activities that are characteristic to a specific area, such as tree climbing in our study. However, new and special activities may become overrepresented on social media, as people are more prone to post about them than daily activities like walking to work. The content analysis in this study was conducted manually, but automated content analysis methods are developing rapidly and provide more efficient tools for analysing large amounts of data (Richards & Tunçer, 2018;Toivonen et al., 2019). While automated content analysis of social media data helps to gain insights about revealed activities in a costefficient way, more traditional surveys and well-designed PPGIS studies remain relevant in recording stated activities and motivations in every day green space use.

Characterizing green space users
Information about the characteristics of green space users (those who have produced the data) was rather restricted in all the sample data sources. Such information may come with the data, or background characteristics can be derived based on further analysis. We sought new information about this question by looking at different language groups in social media data. Across all parks in Helsinki, Finnish was the most popular language, but the language distribution varied across green spaces. Other sample data sets had only aggregated information or no information available about the users' language preferences.
In general, there are several ways to characterize park visitors based on user-generated geographic information. Further information about the users, such as age group, gender and place of residence, could be derived from social media data through user profile information or additional analysis (Toivonen et al., 2019). Sports tracking applications and mobile phone operators possess information about the account holder. For example, mobile phone roaming data can ideally reveal the movement of international visitors. However, personal data is predominantly not accessible to researchers due to privacy regulations, although exceptions do exist (Järv et al., 2015). In contrast, PPGIS surveys enable the collection of demographic background information, such as age and gender, from the respondents, but the surveys usually reach only locals. Combining user-generated data with traditional surveys on a sample population may help to gain reliable background information about the users (Ahas, Aasa, Silm, Tiru, 2010). For example, collecting activity tracks from recruited participants allows linking the data with additional information from surveys for a focused sample of users (Vich et al., 2019).

Limitations with user-generated data
All user-generated data sets and related analysis come with biases that should be recognised in research (boyd et al., 2012). Three main issues require attention with the types of data presented in this study: the representativeness of, access to and ethical use of user-generated content.
None of the user-generated data sources included in this study were a random sample of green space visitors or their activities. The data were mostly a self-selected sample with inherent biases that can be difficult to measure and correct in the absence of ground-truth data. Popularity of social media platforms varies according to age group, and not everyone geotags their posts or shares content openly . Sports tracking data are known to be biased towards the males over women (Oksanen et al., 2015), and the applications are mostly used by sports enthusiasts having different activity patterns compared to the rest of the population. Mobile phone data from a major operator provides the bets representation of the urban population in countries where the penetration rate of mobile phones is high (Ahas, Silm, Saluveer, & Järv, 2009). PPGIS surveys usually aim at a representative sample, but as they are typically short in duration and response rates may be low and biased, good representation is not easy to achieve (Brown, 2017). Popular events and other local conditions might also affect the observed patterns. For example, one user had shared hundreds of photos from a single running event in our Flickr data set. Additionally, it is important to remember that user-generated geographic information depicts actions that took place, but does not reveal why some places were not visited or not marked on the map.
Data access is often the biggest obstacle for taking advantage of user-generated data in research (Ahas et al., 2008;Toivonen et al., 2019). Passively or indirectly contributed data sources from, for example, social media services, activity tracking apps or mobile operators are maintained by private companies. Private companies do not have a requirement to provide data access on a continuous basis, and the technological solutions for accessing the data may change or close without prior notice. For example, changes in social media APIs pose challenges for continuous data collection for research purposes (Lomborg & Bechmann, 2014). Sample data sets used in this study are not all from the same time period due to data access issues (see Supplement S1 for more details) limiting their comparability. Restricted access to proprietary data also limits the transparency and openness of the research process.
There are many ethical questions regarding the use of user-generated data sets in research which should be acknowledged (Zook, Barocas, boyd, Crawford, Keller, Gangadharan, Goodman, Hollander, Koenig, Metcalf, Narayanan, Nelson, & Pasquale, 2017). User-generated geographic information is often personal data, and analysing it without consent can be ethically problematic (Zwitter, 2014). Actively contributed data, such as PPGIS surveys have the possibility to ask for relevant consent directly from the study participants, while publicly shared social media data or mobile phone records are often used without the direct permission from the user. As the field is relatively new, legislation and ethical guidelines on using publicly but passively contributed data sets in research are still immature. Legislation also varies between countries and regions. Therefore, it is in the responsibility of researchers and the research community to take appropriate measures to safeguard sensitive information when managing and analysing personal data. This includes guarding against the re-identification of individuals from data (Zook et al.,2017), and minimizing the amount of data that is stored.

Conclusions
In this paper, we compared the ability of different user-generated data sets to provide information on where, when and how people use and value urban green spaces. Our data source comparisons suggest that social media data, sports tracking data, mobile phone data and PPGIS data can provide valuable insights for understanding urban green space use and preferences, each contributing to the understanding in their own way. While some of the data sources are better suited to answer specific questions, the optimal approach would be to incorporate insights from different types of data as each of the data sources have their limitations. Sports tracking data and mobile phone data help monitoring green space use on a fine temporal resolution throughout the day. Social media data offer a cost-effective source of information about popular and emerging leisure time activities among tourists and locals. PPGIS surveys can be designed to fill in specific information gaps. Despite the evident limitations in user-generated data sets, they are often the best available information about activities and preferences in green spaces.