Leveraging geotagged Twitter data to examine neighborhood happiness, diet, and physical activity
Introduction
The literature examining neighborhood effects on health has flourished in the last decade (Diez Roux, 2001). Extant research has provided evidence on associations between the neighborhood environment and mortality risk (Eames et al., 1993, Morris et al., 1996, Townsend et al., 1988, Tyroler et al., 1993, Waitzman and Smith, 1998a, Waitzman and Smith, 1998b, Wing et al., 1992), life expectancy (Clarke et al., 2010), mental health (Truong & Ma, 2006), self-rated health (Wen, Browning, & Cagney, 2003), obesity, (Black et al., 2010, Heinrich et al., 2008, Mujahid et al., 2008, Smith et al., 2008), and diabetes (Grigsby-Toussaint et al., 2010, Lysy et al., 2013)—even after adjusting for individual characteristics. Poor access to healthy food (Christiansen et al., 2013, Inagami et al., 2006, Morland, Wing, Diez Roux, & Poole, 2002, Morland, Wing, Roux, 2002, Stafford, 2007, Wang et al., 2007), fast food chains (Block, Scribner, & DeSalvo, 2004), the lack of recreational facilities (Brownson et al., 2009, Roemmich et al., 2006), and higher crime rates (Mujahid et al., 2008, Stafford, 2007) all correlate with higher obesity rates. Community happiness levels also have been inversely related to obesity as well as other outcomes including hypertension, suicide, and life expectancy (Blanchflower and Oswald, 2008, Bray and Gunnell, 2006, Di Tella and MacCulloch, 2008, Dodds et al., 2011, Oswald and Powdthavee, 2007, Tella et al., 2003). Adverse neighborhood conditions concentrate in poor, minority neighborhoods (Black et al., 2010, Diez-Roux, 1998, Duncan et al., 1998, Macintyre et al., 1993), thereby increasing health disparities. Furthermore, the epidemic rise in obesity and related chronic diseases in recent decades signal the importance of structural forces and social processes.
Nonetheless, the dearth of data on contextual factors limits the investigation of multilevel effects on health. Certain places (National Archive of Criminal Justice, Baltimore Neighborhood Indicators Alliance – The Jacob France Institute) have extensive neighborhood data collected on them, but they are the exception rather than the rule, and it is difficult to make comparisons across geographies because available measures vary greatly across them. Also patterns seen in specific places may not apply to other places. For instance, estimates and patterns seen in urban areas may not apply to rural areas. Neighborhood data collection is expensive and time consuming, and then only available for certain places or time periods and become outdated quickly (Peterson and Krivo, 2000). Moreover, while comparable neighborhood data across large areas are highly lacking, the neighborhood data we do have are typically data on compositional characteristics (e.g., percent females) and features of the built environment (e.g., number of grocery stores and health care clinics). These data do not capture the social environment, or an individual’s interactions with that environment.
Social processes and networks can affect health via a myriad of mechanisms, such as 1) the maintenance of norms around healthy behaviors via informal social control; 2) the stimulation of new interests such as a new sport or exercise; 3) political advocacy for access to neighborhood amenities and protection against stressors and toxic agents; 4) emotional support; and 5) the dispersal of knowledge about health promotion practices (Ali et al., 2011, Berkman and Syme, 1979, Cohen et al., 2006, Kim et al., 2006, Vartanian et al., 2013). According to Social Learning Theory, learning takes place in a social context (Bandura, 1977). Behaviors are adopted by observing how the behavior is performed by others, attitudes around that behavior, and outcomes associated with that behavior. Empirically, the adoption of specific health behaviors related to food consumption, health screening, smoking, alcohol consumption, drug use, and sleep has been observed to disperse through social networks (Keating et al., 2011, Mednick et al., 2010, Pachucki et al., 2011, Rosenquist et al., 2010, Roy, 2004, Smith and Christakis, 2008). Similarly, evidence suggests that emotional states such as mood (Kramer, Guillory, & Hancock, 2014), happiness (Fowler & Christakis, 2008), depression (Rosenquist, Fowler, & Christakis, 2011), and suicidality (Bearman, & Moody, 2004) can spread through social networks. The measurement of area-level happiness and subjective-well-being is a new and expanding research endeavor (Gallup-Healthways, 2013, Gill et al., 2008, Helliwell et al., 2012, Kramer, 2010, Quercia et al., 2012). For instance, in 2012, the United Nations began its annual release of a World Happiness Report (Helliwell et al., 2012). Social media may influence individuals’ health behaviors but may also be a way to characterize prevalent community characteristics and patterns of behaviors.
Given the vast literature documenting the influence of social networks on individual health behaviors and health outcomes, we believe that social media data represent an important new data resource for neighborhood researchers. Thus, using publicly available, geotagged Twitter data, we construct novel indicators of neighborhood happiness levels, healthiness of food, and physical activity. We conduct quality control activities and perform validation analysis comparing Twitter-derived neighborhood indicators to demographic and economic characteristics of the corresponding census tract. In order to test our computer algorithm for constructing neighborhood indicators, we selected three counties that display diversity in regards to geographical location, landscape, housing market, cultural characteristics, and demographic characteristics (e.g., racial/ethnic composition, age distribution, and household size). The three counties are the following: Salt Lake County, San Francisco County, and New York County.
Section snippets
Social media data collection
From February–August 2015, we utilized Twitter’s Streaming Application Programming Interface (API) to continuously collect a random 1% subset of publicly available tweets with latitudes and longitude coordinates. We present in-depth analyses and findings for three counties in the United States: Salt Lake County (367,204 tweets); San Francisco County (same as San Francisco city; 653,670 tweets); and New York County (1,828,026 tweets).
Spatial join
We linked 99.8% of tweets with available GPS coordinates to
Results
Across the three counties, tweets were more likely to be neutral or positive rather than negative in sentiment (Table 1). Prevalence of happy tweets was highest in New York, followed by San Francisco and Salt Lake counties. About 3.1–6.6% of the tweets were food-related. Of these food tweets, about 16–17% mentioned healthy foods and 6–10% mentioned fast food restaurants. The mean (standard deviation) caloric density of food references was 250–261 calories (per 100 g). Food tweets that did not
Discussion
Social media is a massive data resource that is beginning to be leveraged for health research such as the prediction of flu (Centers for Disease Control and Prevention), predicting the onset of depression (De Choudhury, Gamon, Counts, & Horvitz, 2013), modeling outbreaks (Evans, Fast, & Markuzon, 2013), characterization of emergency response (Lamb, Paul, & Dredze, 2012), investigating spatial patterns in obesity-related tweets and their proximity to McDonalds (Ghosh & Guha, 2013), and
Acknowledgements
This work was supported by the following grants: Dr. Quynh Nguyen was PI on NIH grant 5K01ES025433; Dr. Feifei Li was supported by NSF grants 1443046 and 1251019, and in part by NSFC grant 61428204 and a Google research award. The funding sources did not have any involvement in the study design; collection, analysis and interpretation of data; in the writing of the report; or in the decision to submit the article for publication.
References (86)
- et al.
Neighborhoods and obesity in New York city
Health & Place
(2010) - et al.
Hypertension and happiness across nations
Journal of Health Economics
(2008) - et al.
Twitter reciprocal reply networks exhibit assortativity with respect to happiness
Journal of Computational Science
(2012) - et al.
Fast food, race/ethnicity, and income: A geographic analysis
American Journal of Preventive Medicine
(2004) - et al.
Does food environment influence food choices? A geographical analysis through “tweets”
Applied Geography
(2014) - et al.
Environmental factors that impact the eating behaviors of low-income african american adolescents in Baltimore city
Journal of Nutrition Education and Behavior
(2013) - et al.
Racial and social class gradients in life expectancy in contemporary California
Social Science & Medicine
(2010) - et al.
Collective efficacy and obesity: The potential influence of social factors on health
Social Science & Medicine
(2006) - et al.
Social support in cyberspace: A content analysis of communication within a Huntington’s disease online support group
Patient Education and Counseling
(2007) - et al.
When are ghettos bad? Lessons from immigrant segregation in the United States
Journal of Urban Economics
(2008)
Gross national happiness as an answer to the Easterlin Paradox?
Journal of Development Economics
Context, composition and heterogeneity: Using multilevel models in health research
Social Science & Medicine
You are where you shop: Grocery store locations, weight, and neighborhoods
American Journal of Preventive Medicine
Us state- and county-level social capital in relation to obesity and physical inactivity: A multilevel, multivariable analysis
Social Science & Medicine
The impact of income on the incidence of diabetes: A population-based study
Diabetes Research and Clinical Practice
Neighborhood characteristics associated with the location of food stores and food service places
American Journal of Preventive Medicine
Association of access to parks and recreational facilities with the physical activity of young children
Preventive Medicine
Socioeconomic status and health: A neurobiological perspective
Medical Hypotheses
Walkability and body mass index: Density, design, and new diversity measures
American Journal of Preventive Medicine
Pathways to obesity: Identifying local, modifiable determinants of physical activity and diet
Social Science & Medicine
Poverty, affluence, and income inequality: Neighborhood economic structure and its implications for health
Social Science & Medicine
Using geolocated Twitter data to monitor the prevalence of healthy and unhealthy food references across the US
Applied Geography
Compendium of physical activities: A second update of codes and MET values
Medicine & Science in Sports & Exercise
Weight-Related Behavior among Adolescents: The Role of Peer Effects
PLoS ONE
Social learning theory
Suicide and friendships among American adolescents
American Journal of Public Health
Social networks, host resistance, and mortality: A nine-year follow-up study of alameda county residents
American Journal of Education
Suicide rates, life satisfaction and happiness as markers for population mental health
Social Psychiatry and Psychiatric Epidemiology
Measuring the built environment for physical Activity: State of the science
American Journal of Preventive Medicine
Right time, right place” health communication on twitter: Value and accuracy of location information
Journal of Medical Internet Research
Coping with food Allergy: Exploring the role of the online support group
CyberPsychology & Behavior
Predicting depression via social media
Investigating neighborhood and area effects on health
American Journal of Public Health
Bringing context back into epidemiology: Variables and fallacies in multi-level analysis
American Journal of Public Health
Temporal patterns of happiness and information in a global social Network: Hedonometrics and twitter
PLoS ONE
Social deprivation and premature mortality: Regional comparison across england
BMJ
Modeling the social response to a disease outbreak
Dynamic spread of happiness in a large social network: Longitudinal analysis over 20 years in the Framingham heart study
British Medical Journal
Cited by (60)
Visible green space predicts emotion: Evidence from social media and street view data
2022, Applied GeographySpatial and sentiment analysis of public opinion toward COVID-19 pandemic using twitter data: At the early stage of vaccination
2022, International Journal of Disaster Risk ReductionDiet during the COVID-19 pandemic: An analysis of Twitter data
2022, PatternsCitation Excerpt :Tweets also contain voluntary information about diet and alcohol consumption, allowing for the natural observation of attitudes and behaviors related to food and alcohol consumption. Several studies have shown the feasibility and utility of leveraging social media to track population trends in diet and alcohol consumption.12–17 The objective of our study was to leverage Twitter data to address the gaps in understanding changes in diet and alcohol consumption during the COVID-19 pandemic.
A scoping review on the use of machine learning in research on social determinants of health: Trends and research prospects
2021, SSM - Population HealthCitation Excerpt :A minority of the studies explicitly discussed different reasons the addition of SDH variables improved the prediction (Boerstler & de Figueiredo, 1991; Darvishi et al., 2017; Engchuan et al., 2019; Fu et al., 2007; Li et al., 2019). Several studies used ML for data curation of text (Crossley et al., 2020; Meng et al., 2017; Nguyen et al., 2016a, 2016b, 2017a, 2017b, 2017c; Prayaga et al., 2019; Robson and Boray, 2019), surveys (Adeyinka, Olakunde, & Muhajarine, 2019), maps or street images (Hu et al., 2009; Larkin & Hystad, 2019; Nguyen et al., 2017a; Suel et al., 2019), and accelerometer data (Brondeel, Pannier, & Chaix, 2016). These studies highlighted practical ML applications, e.g., estimating health literacy from the text at the individual patient level (Crossley et al., 2020) or reducing socioeconomic inequalities in medication refills (Prayaga et al., 2019).
#Socialfood: Virtuous or vicious? A systematic review
2021, Trends in Food Science and Technology