Elsevier

Applied Geography

Volume 73, August 2016, Pages 77-88
Applied Geography

Leveraging geotagged Twitter data to examine neighborhood happiness, diet, and physical activity

https://doi.org/10.1016/j.apgeog.2016.06.003Get rights and content

Highlights

  • We constructed novel neighborhood characteristics from geotagged Twitter data.

  • We estimated census tract indicators for happiness, diet and physical activity.

  • Manually labeled- and algorithm-labeled tweets had excellent levels of agreement.

  • Twitter-derived variables correlated with tract sociodemographic characteristics.

  • Social media is a Big Data resource for cost-efficient neighborhood characterization.

Abstract

Objectives

Using publicly available, geotagged Twitter data, we created neighborhood indicators for happiness, food and physical activity for three large counties: Salt Lake, San Francisco and New York.

Methods

We utilize 2.8 million tweets collected between February–August 2015 in our analysis. Geo-coordinates of where tweets were sent allow us to spatially join them to 2010 census tract locations. We implemented quality control checks and tested associations between Twitter-derived variables and sociodemographic characteristics.

Results

For a random subset of tweets, manually labeled tweets and algorithm labeled tweets had excellent levels of agreement: 73% for happiness; 83% for food, and 85% for physical activity. Happy tweets, healthy food references, and physical activity references were less frequent in census tracts with greater economic disadvantage and higher proportions of racial/ethnic minorities and youths.

Conclusions

Social media can be leveraged to provide greater understanding of the well-being and health behaviors of communities—information that has been previously difficult and expensive to obtain consistently across geographies. More open access neighborhood data can enable better design of programs and policies addressing social determinants of health.

Introduction

The literature examining neighborhood effects on health has flourished in the last decade (Diez Roux, 2001). Extant research has provided evidence on associations between the neighborhood environment and mortality risk (Eames et al., 1993, Morris et al., 1996, Townsend et al., 1988, Tyroler et al., 1993, Waitzman and Smith, 1998a, Waitzman and Smith, 1998b, Wing et al., 1992), life expectancy (Clarke et al., 2010), mental health (Truong & Ma, 2006), self-rated health (Wen, Browning, & Cagney, 2003), obesity, (Black et al., 2010, Heinrich et al., 2008, Mujahid et al., 2008, Smith et al., 2008), and diabetes (Grigsby-Toussaint et al., 2010, Lysy et al., 2013)—even after adjusting for individual characteristics. Poor access to healthy food (Christiansen et al., 2013, Inagami et al., 2006, Morland, Wing, Diez Roux, & Poole, 2002, Morland, Wing, Roux, 2002, Stafford, 2007, Wang et al., 2007), fast food chains (Block, Scribner, & DeSalvo, 2004), the lack of recreational facilities (Brownson et al., 2009, Roemmich et al., 2006), and higher crime rates (Mujahid et al., 2008, Stafford, 2007) all correlate with higher obesity rates. Community happiness levels also have been inversely related to obesity as well as other outcomes including hypertension, suicide, and life expectancy (Blanchflower and Oswald, 2008, Bray and Gunnell, 2006, Di Tella and MacCulloch, 2008, Dodds et al., 2011, Oswald and Powdthavee, 2007, Tella et al., 2003). Adverse neighborhood conditions concentrate in poor, minority neighborhoods (Black et al., 2010, Diez-Roux, 1998, Duncan et al., 1998, Macintyre et al., 1993), thereby increasing health disparities. Furthermore, the epidemic rise in obesity and related chronic diseases in recent decades signal the importance of structural forces and social processes.

Nonetheless, the dearth of data on contextual factors limits the investigation of multilevel effects on health. Certain places (National Archive of Criminal Justice, Baltimore Neighborhood Indicators Alliance – The Jacob France Institute) have extensive neighborhood data collected on them, but they are the exception rather than the rule, and it is difficult to make comparisons across geographies because available measures vary greatly across them. Also patterns seen in specific places may not apply to other places. For instance, estimates and patterns seen in urban areas may not apply to rural areas. Neighborhood data collection is expensive and time consuming, and then only available for certain places or time periods and become outdated quickly (Peterson and Krivo, 2000). Moreover, while comparable neighborhood data across large areas are highly lacking, the neighborhood data we do have are typically data on compositional characteristics (e.g., percent females) and features of the built environment (e.g., number of grocery stores and health care clinics). These data do not capture the social environment, or an individual’s interactions with that environment.

Social processes and networks can affect health via a myriad of mechanisms, such as 1) the maintenance of norms around healthy behaviors via informal social control; 2) the stimulation of new interests such as a new sport or exercise; 3) political advocacy for access to neighborhood amenities and protection against stressors and toxic agents; 4) emotional support; and 5) the dispersal of knowledge about health promotion practices (Ali et al., 2011, Berkman and Syme, 1979, Cohen et al., 2006, Kim et al., 2006, Vartanian et al., 2013). According to Social Learning Theory, learning takes place in a social context (Bandura, 1977). Behaviors are adopted by observing how the behavior is performed by others, attitudes around that behavior, and outcomes associated with that behavior. Empirically, the adoption of specific health behaviors related to food consumption, health screening, smoking, alcohol consumption, drug use, and sleep has been observed to disperse through social networks (Keating et al., 2011, Mednick et al., 2010, Pachucki et al., 2011, Rosenquist et al., 2010, Roy, 2004, Smith and Christakis, 2008). Similarly, evidence suggests that emotional states such as mood (Kramer, Guillory, & Hancock, 2014), happiness (Fowler & Christakis, 2008), depression (Rosenquist, Fowler, & Christakis, 2011), and suicidality (Bearman, & Moody, 2004) can spread through social networks. The measurement of area-level happiness and subjective-well-being is a new and expanding research endeavor (Gallup-Healthways, 2013, Gill et al., 2008, Helliwell et al., 2012, Kramer, 2010, Quercia et al., 2012). For instance, in 2012, the United Nations began its annual release of a World Happiness Report (Helliwell et al., 2012). Social media may influence individuals’ health behaviors but may also be a way to characterize prevalent community characteristics and patterns of behaviors.

Given the vast literature documenting the influence of social networks on individual health behaviors and health outcomes, we believe that social media data represent an important new data resource for neighborhood researchers. Thus, using publicly available, geotagged Twitter data, we construct novel indicators of neighborhood happiness levels, healthiness of food, and physical activity. We conduct quality control activities and perform validation analysis comparing Twitter-derived neighborhood indicators to demographic and economic characteristics of the corresponding census tract. In order to test our computer algorithm for constructing neighborhood indicators, we selected three counties that display diversity in regards to geographical location, landscape, housing market, cultural characteristics, and demographic characteristics (e.g., racial/ethnic composition, age distribution, and household size). The three counties are the following: Salt Lake County, San Francisco County, and New York County.

Section snippets

Social media data collection

From February–August 2015, we utilized Twitter’s Streaming Application Programming Interface (API) to continuously collect a random 1% subset of publicly available tweets with latitudes and longitude coordinates. We present in-depth analyses and findings for three counties in the United States: Salt Lake County (367,204 tweets); San Francisco County (same as San Francisco city; 653,670 tweets); and New York County (1,828,026 tweets).

Spatial join

We linked 99.8% of tweets with available GPS coordinates to

Results

Across the three counties, tweets were more likely to be neutral or positive rather than negative in sentiment (Table 1). Prevalence of happy tweets was highest in New York, followed by San Francisco and Salt Lake counties. About 3.1–6.6% of the tweets were food-related. Of these food tweets, about 16–17% mentioned healthy foods and 6–10% mentioned fast food restaurants. The mean (standard deviation) caloric density of food references was 250–261 calories (per 100 g). Food tweets that did not

Discussion

Social media is a massive data resource that is beginning to be leveraged for health research such as the prediction of flu (Centers for Disease Control and Prevention), predicting the onset of depression (De Choudhury, Gamon, Counts, & Horvitz, 2013), modeling outbreaks (Evans, Fast, & Markuzon, 2013), characterization of emergency response (Lamb, Paul, & Dredze, 2012), investigating spatial patterns in obesity-related tweets and their proximity to McDonalds (Ghosh & Guha, 2013), and

Acknowledgements

This work was supported by the following grants: Dr. Quynh Nguyen was PI on NIH grant 5K01ES025433; Dr. Feifei Li was supported by NSF grants 1443046 and 1251019, and in part by NSFC grant 61428204 and a Google research award. The funding sources did not have any involvement in the study design; collection, analysis and interpretation of data; in the writing of the report; or in the decision to submit the article for publication.

References (86)

  • R. Di Tella et al.

    Gross national happiness as an answer to the Easterlin Paradox?

    Journal of Development Economics

    (2008)
  • C. Duncan et al.

    Context, composition and heterogeneity: Using multilevel models in health research

    Social Science & Medicine

    (1998)
  • S. Inagami et al.

    You are where you shop: Grocery store locations, weight, and neighborhoods

    American Journal of Preventive Medicine

    (2006)
  • D. Kim et al.

    Us state- and county-level social capital in relation to obesity and physical inactivity: A multilevel, multivariable analysis

    Social Science & Medicine

    (2006)
  • Z. Lysy et al.

    The impact of income on the incidence of diabetes: A population-based study

    Diabetes Research and Clinical Practice

    (2013)
  • K. Morland et al.

    Neighborhood characteristics associated with the location of food stores and food service places

    American Journal of Preventive Medicine

    (2002)
  • J.N. Roemmich et al.

    Association of access to parks and recreational facilities with the physical activity of young children

    Preventive Medicine

    (2006)
  • J.P. Roy

    Socioeconomic status and health: A neurobiological perspective

    Medical Hypotheses

    (2004)
  • K.R. Smith et al.

    Walkability and body mass index: Density, design, and new diversity measures

    American Journal of Preventive Medicine

    (2008)
  • M. Stafford

    Pathways to obesity: Identifying local, modifiable determinants of physical activity and diet

    Social Science & Medicine

    (2007)
  • M. Wen et al.

    Poverty, affluence, and income inequality: Neighborhood economic structure and its implications for health

    Social Science & Medicine

    (2003)
  • M.J. Widener et al.

    Using geolocated Twitter data to monitor the prevalence of healthy and unhealthy food references across the US

    Applied Geography

    (2014)
  • B.E. Ainsworth et al.

    Compendium of physical activities: A second update of codes and MET values

    Medicine & Science in Sports & Exercise

    (2011)
  • M.M. Ali et al.

    Weight-Related Behavior among Adolescents: The Role of Peer Effects

    PLoS ONE

    (2011)
  • Baltimore Neighborhood Indicators Alliance – The Jacob France Institute. Vital Signs 11 Reports....
  • A. Bandura

    Social learning theory

    (1977)
  • P.S. Bearman et al.

    Suicide and friendships among American adolescents

    American Journal of Public Health

    (2004)
  • L. Berkman et al.

    Social networks, host resistance, and mortality: A nine-year follow-up study of alameda county residents

    American Journal of Education

    (1979)
  • I. Bray et al.

    Suicide rates, life satisfaction and happiness as markers for population mental health

    Social Psychiatry and Psychiatric Epidemiology

    (2006)
  • R.C. Brownson et al.

    Measuring the built environment for physical Activity: State of the science

    American Journal of Preventive Medicine

    (2009)
  • S.H. Burton et al.

    Right time, right place” health communication on twitter: Value and accuracy of location information

    Journal of Medical Internet Research

    (2012)
  • Centers for Disease Control and Prevention. CDC Competition Encourages Use of Social Media to Predict Flu....
  • N.S. Coulson et al.

    Coping with food Allergy: Exploring the role of the online support group

    CyberPsychology & Behavior

    (2007)
  • M. De Choudhury et al.

    Predicting depression via social media

  • A.V. Diez Roux

    Investigating neighborhood and area effects on health

    American Journal of Public Health

    (2001)
  • A. Diez-Roux

    Bringing context back into epidemiology: Variables and fallacies in multi-level analysis

    American Journal of Public Health

    (1998)
  • P.S. Dodds et al.

    Temporal patterns of happiness and information in a global social Network: Hedonometrics and twitter

    PLoS ONE

    (2011)
  • M. Duggan et al.
  • M. Eames et al.

    Social deprivation and premature mortality: Regional comparison across england

    BMJ

    (1993)
  • EnchantedLearning.com. Food and Eating Vocabulary Word List. http://www.allaboutspace.com/wordlist/food.shtml Accessed...
  • J. Evans et al.

    Modeling the social response to a disease outbreak

  • J.H. Fowler et al.

    Dynamic spread of happiness in a large social network: Longitudinal analysis over 20 years in the Framingham heart study

    British Medical Journal

    (2008)
  • Gallup-Healthways. State of American Well-being: 2013 State Rankings and Analysis. http://info.healthways.com/wbi2013...
  • Cited by (60)

    • Diet during the COVID-19 pandemic: An analysis of Twitter data

      2022, Patterns
      Citation Excerpt :

      Tweets also contain voluntary information about diet and alcohol consumption, allowing for the natural observation of attitudes and behaviors related to food and alcohol consumption. Several studies have shown the feasibility and utility of leveraging social media to track population trends in diet and alcohol consumption.12–17 The objective of our study was to leverage Twitter data to address the gaps in understanding changes in diet and alcohol consumption during the COVID-19 pandemic.

    • A scoping review on the use of machine learning in research on social determinants of health: Trends and research prospects

      2021, SSM - Population Health
      Citation Excerpt :

      A minority of the studies explicitly discussed different reasons the addition of SDH variables improved the prediction (Boerstler & de Figueiredo, 1991; Darvishi et al., 2017; Engchuan et al., 2019; Fu et al., 2007; Li et al., 2019). Several studies used ML for data curation of text (Crossley et al., 2020; Meng et al., 2017; Nguyen et al., 2016a, 2016b, 2017a, 2017b, 2017c; Prayaga et al., 2019; Robson and Boray, 2019), surveys (Adeyinka, Olakunde, & Muhajarine, 2019), maps or street images (Hu et al., 2009; Larkin & Hystad, 2019; Nguyen et al., 2017a; Suel et al., 2019), and accelerometer data (Brondeel, Pannier, & Chaix, 2016). These studies highlighted practical ML applications, e.g., estimating health literacy from the text at the individual patient level (Crossley et al., 2020) or reducing socioeconomic inequalities in medication refills (Prayaga et al., 2019).

    • #Socialfood: Virtuous or vicious? A systematic review

      2021, Trends in Food Science and Technology
    View all citing articles on Scopus
    View full text