Using Twitter data in urban green space research: A case study and critical evaluation

• Users may freely distribute the URL that is used to identify this publication. • Users may download and/or print one copy of the publication from the University of Birmingham research portal for the purpose of private study or non-commercial research. • User may use extracts from the document in line with the concept of ‘fair dealing’ under the Copyright, Designs and Patents Act 1988 (?) • Users may not further distribute the material nor use it for the purposes of commercial gain.


Green space
Urban green spaces are gaining an increasing amount of attention in both academic and policy arenas. The potential benefits they offer to human populations are increasingly significant in an urbanised society given their potential to facilitate improved human and societal well-being (Chiesura, 2004). The benefits that nature and green space can provide to human populations are broadly termed ecosystem services; encompassing the regulating, supporting, provisioning and cultural benefits that ecosystems provide (Costanza et al., 1997;Daily, 1997;Ehrlich & Ehrlich, 1981;MEA, 2005).
Urban green spaces are beneficial for human populations in a myriad of ways (Keniger, Gaston, Irvine, & Fuller, 2013) as is clear from the wide range of ecosystem services that have been identified (Costanza et al., 1997;Daily, 1997;Ehrlich & Ehrlich, 1981;MEA, 2005). The benefit of relevance to this case study is the interactions they facilitate between nature and human individuals in the surrounding communities. Provision and engagement with activity in such spaces is thought to facilitate social cohesion (Ewert & Heywood, 1991;Groenewegen, Berg, Vries, & Verheij, 2006), foster social empowerment (Westphal, 2003) and mitigate exclusion and isolation of certain groups (Seeland, Dübendorfer, & Hansmann, 2009;Shinew, Glover, & Parry, 2004). It has also been found that green spaces have a significant role in developing a sense of place, community identity and ownership of space (Kim & Kaplan, 2004). Urban green spaces are also important in facilitating interactions between people and nature. Such interactions have been found to be beneficial to human wellbeing; reducing stress and encouraging pro-social and sustainable behaviours (Maller, Townsend, Pryor, Brown, & St Leger, 2006).
While these ecosystem services have been defined, there is a significant lack of understanding as to how these benefits are transferred to and received by urban populations and the circumstances under which this can happen most effectively.
Insight into the interactions between urban populations and green spaces can provide direction for planners to instigate schemes to improve the quality of human life which is imperative given the increasing urbanisation occurring across the globe. Such endeavours to manage urban green spaces most effectively require research founded on an evidence based approach. Similarly, such information can influence local authority decision making and encourage the incorporation of functional and usable green spaces into the urban environment for the benefit of its urban citizens.
Understanding how green spaces are used is a fundamental starting point in improving insights about their significance for human and societal well-being. Therefore, knowing when and how people are engaging with urban green spaces is important, as well as an awareness of the potential barriers that may be preventing their use. Often a space can have multiple, simultaneous functions for different groups of users and understanding these can help in the development of sustainable land use management strategies (Peng, Chen, Liu, Lu, & Hu, 2016).
Previous attempts to examine how and when human populations make use of urban green spaces have followed two methodological approaches. The first of these relies on a qualitative, report based approach in which surveys, interviews and focus groups are used to gain information from participants on a range of topics. These have included the effects of deprivation on green space access (Jones, Hillsdon, & Coombes, 2009), features of green space which may promote increased use (Schipperijn, Sttigsdotter, Randrup, & Troelsen, 2010), personal motivations and barriers to using green spaces (Gidlow & Ellis, 2011), attempts to assess the non-market economic value of green space (Lockwood & Tracey, 1995;Del Saz Salazar & Garcia Menendez, 2007) and assessing the benefits people feel they receive from urban green space in times of heat stress (Lafortezza, Carrus, Sanesi, & Davies, 2009).
Self-reporting techniques have also been employed to assess the impact of park improvements on use and activity (Cohen et al., 2009b). Whilst employed extensively in researching urban green space use, report based methods have a number of limitations. For example, as it is inherently reliant upon participant responses there may be issues with recall and social desirability biases (Evenson, Wen, Hillier, & Cohen, 2014) and it is difficult for the information received from participants to be independently validated. Indeed, where validations of questionnaire responses have been undertaken disagreements have been found. For example, Evenson, Wen, Golinelli, Rodrigues, and Cohen (2012) found only an acceptable agreement (67e82% percent agreement) between the actual and reported park visits of participants, using GPS monitors to validate responses.
The second way in which urban green spaces have been examined utilises an observational approach, treating the urban green space as a study location while researchers record the visitors and activities on-going within it. While used less extensively than report-based methods, such approaches have been employed in a range of contexts related to urban green spaces; including investigations into the types of activities that occur in green spaces (Tzoulas & James, 2010), features of a park associated with physical activity (Kaczynski, Potwarka, & Saelens, 2008) and the influence of meteorological variables on green space use (Thorsson, Lindqvist, & Lindqvist, 2004). Studies using this method have also investigated the effect of race, age and gender (Cohen et al., 2007;West, 1989) on the use of neighbourhood parks and their significance as a location for physical activity. Specific protocols such as the SOPARC (System for Observing Play and Recreation in Communities) have also been developed in an attempt to produce a standardised approach to observational methods of park use (McKenzie, Cohen, Sehgal, Williamson, & Golinelli, 2006). SOPARC has been used in investigations of the differences between rural and urban park visits (Shores and West, 2010) and how the installation of fitness zones affects physical activity engagement in parks (Cohen, Marsh, Williamson, Golinelli, & McKenzie, 2010). Observational methods have a significant limitation, requiring multiple observations over different days and seasons to ensure reliability (Cohen et al., 2009a) as park use patterns between specific observation times cannot be reliably estimated. Significant time is therefore required and as a result studies utilising observations tend to lack longitudinal depth.
Consequently, using current methods of observation and subjective reporting the measurement of human interactions with urban green spaces is challenging in terms of achieving consistent results. Methodological progression and new approaches are required to overcome the limitations currently faced (Orr, Paskins, & Chaytor, 2014).

Crowdsourcing and social networks data
Various emerging technologies now have the potential to advance assessment techniques of human interaction with urban green spaces: how, when and why people use them, what activities occur within them, and how people feel while using them. Social networks and social media systems enable anyone connected to the internet to provide information about their current location, feelings and activities. As such they provide a source of sensing and information that can be used to understand motivational factors behind the habits of populations (Silva, Vaz de Melo, Almeida, Salles, & Loureiro, 2013). This is an example of crowdsourcing which, in its simplest form, refers to a group of people producing data that can be used by third parties to solve a problem (Estell es-Arolas & Gonz alez-Ladr on-de-Guevara, 2012). In the context of this paper, the crowd is comprised of users of smart technology devices (Kleeman, Voss, & Rieder, 2008) who share information on social media platforms via the internet. The crowdsourcing of information is becoming increasingly utilised as individuals become progressively connected and accessible in the information age (Brand, 2012).
It is clear that the recent proliferation of mobiles devices are key to crowdsourcing (Kanhere, 2011), especially when obtaining crowdsourced information from social media platforms e a mobile device enables anyone connected to the internet to share their information at any time. These social networks provide a platform to create a human powered participatory sensing network (Demirbas, Bayir, Akcora, Yilmaz, & Ferhatosmanoglu, 2010) in which the mobile devices carried by the users become the nodes of the network, connected to provide continuous information to a server. A number of social networks are increasingly present in the day to day lives of millions of people around the world and have already been employed in an academic context (Su, Wan, Hu, & Cai, 2016).
Created and launched in 2006, Twitter is a free microblogging service which enables users to communicate through short statuses and messages of up to 140 characters in length. Anyone connected to the internet via a smart device or computer, and with a Twitter account has high speed access with the ability to receive and share information. This connectivity along with the large number of users makes Twitter a highly influential player in the distribution of information and opinion (Mathioudakis & Koudas, 2010). Its popularity is credited to the ability of users to gain insight into other users without having a connection with them (Russell, 2013;Suh, Hong, Pirolli, & Chi, 2010). Following a person on Twitter affords a user instant access to another person's profile without the need for the other to give permission, follow them back, or even be aware of them (Weng, Lim, Jiang, & He, 2010). Indeed many pages have open access status and do not require any sign up to Twitter, unlike other networks such as Facebook.
Information obtained from Twitter has already been used successfully in urban research. Tweet information has aided land use classifications of urban areas (Noulas, Scellato, Mascolo, & Pontil, 2011;Frias-Martinez & Frias-Martinez, 2014;Zhan, Ukkusuri, & Zhul, 2014) and has been used to investigate the emotional responses of people to urban spaces (Bifet & Frank, 2010;Hauthal & Burghardt, 2013;Klettner, Huang, Schmidt, & Gartner, 2013). It has also been shown to be useful in following how information spreads through urban areas (Malleson & Andresen, 2015;Yardi & Boyd, 2010) and extension apps can be used to monitor a range of environmental variables (Demirbas et al., 2010). This paper draws on the successes of such studies and sees them as justification for the inclusion of Twitter data in urban research. While Twitter data has been used to investigate cityscapes in general, it has not yet been applied to the study of urban green spaces despite their significance as components of the urban landscape. This paper provides a first introduction of the utility of Twitter data in the study of urban green space and the potential of crowdsourced information in improving understandings of such spaces and their importance for human populations. Twitter has been selected over other social networks due to the ease of accessing public data as well as the large numbers of users on the network generating this information.

Methodology, dataset and thematic analysis
The method described herein explores the potential of Twitter data as a source of information about human interactions with urban green spaces. To gain access to this publicly available data, it is necessary to connect to the Twitter API. R Studio was used as the interface through which connection to the Twitter API was made. The 'twitteR' package is designed specifically for working with the Twitter API and the necessary coding functions are already in place. Crucially this method makes use of the OAuth protocol, a method which enables third party researchers to access user data without gaining access to their password and other private information (Hawker, 2010;Russell, 2013). Access through OAuth grants a third party user an access token and an access token secret which act as their credentials to access the user data. Using this package a range of metadata is returned alongside the tweet text, as shown in Fig. 1.
To obtain the tweets for study, a search was made of Twitters REST API using the park name as the search query (e.g. "Aston Park"). This was the only search term used in order to obtain the full range of tweets related to each park and prevent restriction or biasing of the tweet responses to certain types of activity. Tweets were then manually screened to ensure those included in the sample were relevant and reported an interaction with the specified urban green space.
A basic example is now presented demonstrating the utility of such Twitter data in the assessment of human use of urban green space. The study sites are located in Birmingham, the second largest city in the United Kingdom with an estimated population of 1.1 million (ONS, 2014). Within the metropolitan area, there are nearly 600 parks, public open spaces and nature reserves (BCC, 2016), the most of any European city. They provide an important resource for the surrounding populations in terms of their contribution to cultural ecosystem service provision.
Using Twitter data collected from tweets concerning urban green spaces in Birmingham, the range of information that can be obtained using crowdsourcing and social network data is illustrated. A case in point is the variety of organised activities found to occur in these green spaces and their importance for social interactions, economic opportunity and community identity is subsequently discussed. A thematic approach was taken in the subsequent analysis bringing together themes from a number of literature typically disengaged from the ecosystem services debate, including economics, social policy and cultural studies.

Results
Taking the summer months of 2015 (June, July August) as the study period, 24 out of 46 urban parks and green spaces were identified as hosting one or more organised event/s. The locations of these parks are given in Fig. 2. Parks were chosen to create a sample that reflects the variety of parks in Birmingham. Parks of varying characteristics were selected based on their size, the presence of woodland and water bodies, the presence of a number of different amenities, and their status as Green Flag parks, Nature Reserves and Active Parks locations.
From a total of 2847 tweets received over the study period 793 tweets relating to 61 separate events were identified, shown in Fig. 3. Tweets were categorised manually based on an assessment of their text and image content into 11 categories, one of which encompassed organised events. Other categories included physical activity, non-physical activity, nature related activity, charitable activity, economic activity, volunteering, political and religious focused tweets and information based tweets. The organised events category has been utilised herein to ensure a robust number of tweets for analysis.
The number of events recorded in each park ranged from 1 to 17 (Fig. 4). The highest number of tweets relating to a single event occurred at the Fusion Music Festival (Cofton Park) with a total of 335 tweets recorded. This is unsurprising given the size and popularity of the event with both locals and those from further afield who travel to the event, as was made clear in the tweets received.
The events identified provide opportunities for individuals to engage at a range of scales, from global events such as the Rugby World Cup (Eastside City Park), national events such as World Food Day (Cannon Hill Park) and the Summer Solstice (Lickey Hills), regional events such as the Big Hoot and Eid celebrations (Small Heath Park) and finally local events such as Edgbaston Regatta (Edgbaston Reservoir) and Acocks Green Carnival (Fox Hollies).
The overwhelming majority (80%) of the events identified were local events (Table 1), reflecting the important role of urban green spaces in providing a location where members of the local community can come together and socialise (Low, Taplin, & Scheld, 2009). Previous research has identified the role that urban green spaces play in developing an individual's sense of identity and feeling of connectedness with others (Kim & Kaplan, 2004). Such events may help to achieve this in the local communities in Birmingham. They may also be particularly important for older adults with limited mobility as being able to meet people in their local area is important for them to maintain social ties and a sense of connectedness to the community (Kweon, Sullivan, & Wiley, 1998).
From the tweets received it is also possible to identify the role of urban green spaces as places of engagement with a range of social, political and religious ideologies. For example, the Eid festival (Small Heath Park), Refugee week events, Vegan Picnic and Pankhurst Picnic (Cannon Hill Park) bring likeminded people together through a shared faith, or perspective. This again links to the importance of urban green spaces in the development of individual and community identity. In agreement with Mitchell (1995), urban green space is shown to be an important space into which different religious, social and political perspectives are brought and celebrated. Other cultural events were shown to occur, with music events such as the Birmingham Mela (Cannon Hill Park) providing the chance for communities to participate in cultural activities including music, dance and art.
The urban green spaces sampled also provided space for a range Fig. 1. Example.csv file containing text and metadata information returned from the Twitter API.
of activities aimed at facilitating social inclusion and empowerment for groups facing social isolation or other difficulties. Youth projects such as the Girls Youth Hub (Cannon Hill Park) and Sparkhill Youth Project (Sparkhill Park) have previously been identified as having an important role in social inclusion and integration of young people from a range of cultural backgrounds (Seeland et al., 2009). The opportunity that some events create for an individual to meet with others in a similar position to their own can also be beneficial. For example, Brummy Mummy Meetups enable new mothers to meet, socialise and discuss issues they may be facing. The facilitation of interactions between humans and nature is an important role provided by urban green spaces (Maller et al., 2006). More natural environments have been found to have a restorative effect on cognition, providing an environment with less stressor and a variety of intriguing stimuli (Berman, Jonides, & Kaplan, 2008;Kaplan, 2001). Events which focus on bringing people into contact with nature such as Bioblitz, the Big Bog Lunch and flower planting help to facilitate these interactions and bring about improved mental well-being.
While the economic potential of urban green space has been considered extensively from the perspective of the whole city in terms of increasing land value, storm water management and nonmarket assets (Smardon, 1988;Del Saz Salazar & Garcia Menendez, 2007;Millward & Sabir, 2011), few studies have accounted for their role as discrete spaces for economic activity to take place. 40 of the events identified (66%) had an economic element, providing local businesses, charities or larger organisations with the chance to increase brand exposure and make financial gains through participating in them. Food festivals and summer fetes are a particular example where local businesses set up stalls and are provided with an opportunity for engagement with the local community. On a larger scale, music festivals such as Fusion Festival (Cofton Park) exemplify the opportunities provided to a range of sectors from entertainment, food, security and logistics. Charity events such as those taking place in association with Refugee Week (Cannon Hill Park) provide fundraising opportunities for charities as well as promoting pro-social behaviour and improving individual well-being (Thoits & Hewitt, 2001).
From the three month study period it was possible to elicit a large amount of information from Twitter as to the events occurring in the sampled urban green spaces. The implications these have for individual and societal well-being are discussed and bring together themes from a number of disciplines.

Critique
This discussion has explored the potential of using crowdsourced data from Twitter in explorations of human interactions with urban green spaces. This method was proposed following the identification of a number of limitations with the previous observational and self-report based methodologies employed; and the successful use of Twitter data in a range of urban related research. It is important to evaluate this method and highlight its utility compared to previously utilised methods, as well as identify any limitations which have become apparent in this use of Twitter as a data source to inform human use of urban green spaces.

Advantages of data collection using Twitter
A large criticism of observational approaches is that to be reliable, they require extensive repeat measurements at the same location (Cohen et al., 2009a), incurring large time and cost expenditures. Twitter data do not have this limitation; indeed tweets can be captured with ease as frequently as necessary providing an opportunity for more measurements to be taken at no extra time or financial cost, achieving greater longitudinal depth as a result. Compared to self-report based methods, tweets are often posted with a photograph giving visual evidence for validation purposes.
The method also provides an unobtrusive method of nonparticipation which is easy to replicate, improving the potential for a standardised approach to be developed. Being free, publicly available and instantly accessible, the data collection method using Twitter incurs no financial cost and takes significantly less time compared to previously used methods. The consistency of Twitter data in terms of the 140 character limit on tweets means that analysis is more straightforward than other mixed media posts of varying lengths (Highfield & Leaver, 2014). These attributes mean Twitter data is well placed to provide information on the wellbeing, behaviours and activities occurring within communities (Nguyen et al., 2016).  Table 1 The scale of the events identified.

Scale of event
No. of events of each scale Global 3 National 3 Regional/City wide 6 Local 49

Issues identified with using Twitter
Despite these benefits, the use of this method also raises some issues which should be taken into consideration in order for it to be utilised most effectively. The first of these is that crowdsourcing information via social networks limits the base population to which one is investigating and there is a need to discuss the inherent biases in these datasets (Hannay & Baatard, 2011). Crucially, those members of urban populations who do not own a smart device are excluded from the sample population. This can have implications for examining the use of space by various sectors of the population such as older people (aged 75þ) who show disproportionate levels of non-engagement with these forms of technology (Zickuhr & Madden, 2012). Various spaces throughout the cityscape are supposed to be spaces where all members of the community can come together, but the users of social media do not reflect this diversity (Schwartz & Hochman, 2014) and are therefore not a truly representative cross section of the population. It should also be noted that very limited demographic information is known about the population from which tweets are received. Information about age, occupation or ethnicity is not available through the method described herein which may limit the type of investigation which can be carried out using this method. The metadata provided with the downloaded tweets can go some way to addressing this problem. For example it is possible to ascertain the gender of Twitter users through a search of their profile name on Twitter.
Numerous studies have been undertaken to try to determine the types of people who engage actively with social media (Bendler, Wagner, Brandt, & Neumann, 2014;Coleman, Georgiadou, & Labonte, 2009) as a means to assess source credibility. As a general rule, extroverts tend to be more frequent users of social media (Correa, Hinsley, & De Zuniga, 2010) with adults (aged 18e49) making up an increasingly large proportion of those actively engaging through posts (Lenhart, Purcell, Smith, & Zickuhr, 2010). Subsections of the population may be missing in the received dataset due to the inherent biases in using this type of technology; however because no demographic information is available it is difficult to know the direction in which the sample is nonrepresentative.
Methods based on crowdsourcing make use of mobile devices through which people communicate and create a network. This creates issues associated not with the data collection method itself but those that need to be taken into account to understand what limitations there may be on the data available for capture by the method. The quality of internet connection can vary substantially between mobile networks and signal may be intermittent in some areas. An area with limited or no connection to the internet may lead to areas with no recorded use which may not necessarily be an accurate reflection of reality (Chatzimilioudis, Konstantinidis, Laoudias, & Zeinalipour-Yazti, 2012). While this can limit the production of data, in urban areas such as Birmingham, poor internet connection and mobile phone coverage are unlikely to be problematic as much of the city has 4G coverage. Appropriate selection of where this method is employed can overcome this obstacle to effective use.
User privacy and the ethics of obtaining data in the ways described herein is an important area of consideration when engaging with crowdsourcing through social media (Ma, Wei, Chai, & Xie, 2008;Burghardt, Buchmann, Müller, & B€ ohm, 2009;Vicente, Freni, Bettini, & Jensen, 2011). Being able to access the necessary information without compromising the privacy of the user is extremely important to users and researchers alike. This is not a significant issue using the method described in this paper as only public Twitter accounts are used to provide information, i.e. those who have enabled anyone to view their profile. The use of the OAuth process also addresses the need for privacy and data protection ensuring no personal account details are accessible.

Improving the robustness of a Twitter captured dataset
It should also be noted that the data gleaned from social networks are rarely produced with the aim of it being utilised in scientific research. There may be inaccuracies in their narrative that seem inconsequential to the user but may have significant implications for the research output if utilised by the researcher (Flanagin & Metzger, 2008). While this paper has shown the potential of Twitter in generating a dataset suitable for investigating human interaction with urban green spaces, there are a number of ways in which the robustness of such a dataset can be improved for research.
Improvements could be made to the resultant dataset by actively engaging and encouraging people to tweet about a specific subject or location of interest. A tried and tested way in which this is achieved is to create a hashtag unique to the study which individuals could be encouraged to add to their tweets. This hashtag could then be inputted into the search query to pull out relevant tweets. This is already being utilised by political campaigns and commercial companies to enable the tracking of Twitter responses to their product or ideas and the creation of a cyber-community who interact together through the use of specific hashtags. While not a traditional approach to data collection, this inductive approach could be employed in the research community with project specific hashtags or accounts affording new opportunities and the creation of a more robust dataset. Success has already been seen to this end with the use of the @ecorecordings account encouraging citizen science engagement with nature sightings.
With respect to the method described in this paper which connects to the Twitter API, it is important to note that an exhaustive source of tweets is not returned. Any search made to the API provides information of tweets produced in the last 10 days or so. One way to overcome this limitation is to make use of the Firehose API, a feed provided by Twitter that allows access to all public tweets. A significant problem to the use of the Firehose data however is the restrictive cost, as well as the amount of resources required to retain the Firehose data (servers, network availability, and disk space). To ensure maximum possible capture of tweets when using the Twitter API it is advisable to make regular searches to the API approximately every 10 days improving the completeness of the dataset received. Examination of the metadata downloaded with the tweet text, shown in Fig. 1 can help to provide information about the tweets received. Information such as the time and date of creation and name of the creator provides context to the dataset and improves robustness.

Conclusion
This paper has presented a method to investigate human use and interaction with urban green spaces following a critique of the current approaches employed to this end. Twitter is presented as a source of data which can be gathered through crowdsourcing. An example case study of 46 locations in Birmingham, UK has shown the potential of this new approach over a three month period. Twitter data was found to be successful in providing information about the range of organised events occurring in these urban green spaces, indicating the diversity in how urban populations make use of them. A high prevalence of local events was identified along with the provision of opportunity for engagement with regional and national events. The study sites were found to host a range of activities facilitating community engagement with social, cultural, political, religious and nature based events, while also providing space for a range of economic activities.
In comparison to observational and subjective reporting methods which have been used previously in this area of research, the method presented herein offers a number of benefits. These include the free, publically available and immediately accessible nature of Twitter data, improved longitudinal depth that this method affords and potential to produce a standardised procedure to investigations. It also addresses the time and cost constraints identified with previous methods.
While this method has been demonstrated successfully herein, there is a need to identify a number of confines which much be addressed in order to utilise it most appropriately and effectively. These include privacy issues, biases in the received datasets and a lack of demographic information about the individuals included in the dataset.