Can we vote with our tweet? On the perennial difficulty of election forecasting with social media
Introduction
The virtualization of human behavior has generated enormous quantities of data about what people do or say, to whom, and when. The presence of such data has naturally given rise to the desire to study past behavior, and ultimately, to forecast the future. High-profile commercial successes in advertising, sales, logistics, and other fields hint that it may be possible to fulfill this desire. If so, this offers a potential solution to political professionals and scholars who are struggling with the increasing cost and difficulty of traditional survey research and electoral polling.
However, our experience to date in forecasting important events, political or otherwise, has not always borne out the promise of new predictive powers. Success in predicting what people will want to buy, or what ads they might click on, has not translated into reliable predictions in sectors such as public health or finance. Research into other means of online opinion discovery, such as consumer rating systems, has shown them to be vulnerable to systematic bias. These difficulties have cast doubts on the potential for social media data to provide a new means of forecasting high-profile real-world events.
This paper discusses the history of electoral forecasting from social media and other online data. I note that all known examples of apparently successful forecasts have very quickly encountered problems that have undermined their predictive power. Their initial success at beating simple heuristics for electoral success, such as incumbency, has faded quickly when faced with the challenge of forward-looking election prediction. These difficulties have led to high-profile claims that social media data will soon supplant polling having a short shelf life. Hence, while social media and other online data may be useful for studying how citizens behave online, they have proven rather less useful for forecasting what those same citizens–and, more importantly, their offline fellows–will do back in the real world.
I conclude by discussing whether these problems can be overcome. I agree with earlier critiques of the election forecasting literature, which have recommended fundamental changes in the transparency, disclosure, method, and publication of electoral forecasts that rely on social media. However, I raise the question of whether we should ever expect reliable election forecasts even after such changes. Because social media forecasting lacks the control over the data-generating process which is enjoyed by polling, it faces enormous instability in the origin, sample, structure, and content of the data on which it relies. This instability should undermine easy claims that social media-based forecasts will replace polls any time soon. Instead, social media may only prove one more channel for the application of traditional polling methods, rather than a revolution in social prediction.
Section snippets
Forecasting the political
From what began as a catalogue of human knowledge, the internet has become an instrument for monitoring human behavior. Using the data from that instrument, new companies have successfully measured and predicted a range of online and offline behaviors. The most notable successes include radically more effective ad targeting, translation, and email spam detection (Google), the discovery and recommendation of professional (LinkedIn) and social (Facebook) contacts, early detection of and marketing
A multi-cycle example of failure
The author can now add his own experience to the catalog of failed attempts at election prediction. Over the 2010–2012 election cycle, we performed what we believe to be the first multi-cycle experiment in forecasting elections in the US House of Representatives. The sampling strategy was designed to circumvent some of the problems inherent in using the Streaming API. We made and disclosed forecasts for the 2012 House elections in real time during the campaign, based on algorithms and data
Towards better political forecasts
Given this pattern of failure, and the thorny root causes thereof, what might improve the odds of building successful social media-based forecasts of election outcomes? Gayo-Avello (2012) attempted to define a baseline set of criteria for serious election forecasts from Twitter data. As a minimum, they recommend:
- 1.
Forecasters should have a well-defined hypothesis as to why their method should work.
- 2.
Ahead of time, forecasters should define the right baseline–usually the rate of incumbent
Conclusions: on the difficulty of forecasting
Headline successes in forecasting the offline world from online data have relied on events that are atomic, exchangeable, frequent, and low-risk. When forecasting models have departed from these characteristics–as with Google Flu or attempts at stock market forecasting–they have often run into serious difficulties. The assumption of a relatively stable link between online and offline behavior appears to work reasonably well when forecasting purchasing behaviors, estimating advertising
Acknowledgments
Enormous credit and thanks are due to Len DeGroot of the Graduate School of Journalism at the University of California, Berkeley, for hosting the real-time publication of predictions during the 2012 election; and to Hillary Sanders for invaluable research support. Earlier versions of this paper benefited from comments and feedback from F. Daniel Hidalgo, Jasjeet Sekhon, and participants at the 2011 UC Berkeley Research Workshop in American politics. Chris Diehl, Dave Gutelius, Stu Feldman, and
References (50)
- et al.
Forecasting elections with non-representative polls
International Journal of Forecasting
(2015) - Barberá, P. (2012). Birds of the same feather tweet together: Bayesian ideal point estimation using Twitter data. APSA...
- Beauchamp, N. (2013). Predicting and interpolating state-level polling using Twitter textual data. Working paper,...
- et al.
On using Twitter to monitor political sentiment and predict election results
Sentiment Analysis where AI meets Psychology (SAAIP)
(2011) Don’t feed the trolls! Managing troublemakers in magazines’ online communities
Journalism Practice
(2012)- et al.
Latent Dirichlet Allocation
The Journal of Machine Learning Research
(2003) - et al.
Fast unfolding of communities in large networks
Journal of Statistical Mechanics: Theory and Experiment
(2008) - et al.
A 61-million-person experiment in social influence and political mobilization
Nature
(2012) - et al.
Sampling cell phone only households: A comparison of demographic and behavioral characteristics from ABS and cell phone samples
Survey Practice
(2013) - et al.
Do online advertisements increase political candidates’ name recognition or favorability? Evidence from randomized field experiments
Political Behavior
(2013)
When Google got flu wrong
Nature
Elections and the regression discontinuity design: Lessons from close US House races, 1942–2008
Political Analysis
Predicting the present with Google trends
Economic Record
Detecting influenza epidemics using search engine query data
Nature
Text as data: The promise and pitfalls of automatic content analysis methods for political texts
Political Analysis
Why the Pirate Party won the German election of 2009 or the trouble with predictions: A response to Tumasjan, A., Sprenger, T.O., Sander, P.G., & Welpe, I.M. “Predicting elections with Twitter: What 140 characters reveal about political sentiment”
Social Science Computer Review
Gauging the impact of growing nonresponse on estimates from a national RDD telephone survey
Public Opinion Quarterly
Cited by (64)
Panic buying and fake news in urban vs. rural England: A case study of twitter during COVID-19
2023, Technological Forecasting and Social ChangeExploring the social influence of the Kaggle virtual community on the M5 competition
2022, International Journal of ForecastingForecasting: theory and practice
2022, International Journal of ForecastingPolitical preferences nowcasting with factor analysis and internet data: The 2012 and 2016 US presidential elections
2021, Technological Forecasting and Social ChangeCitation Excerpt :In the US, at least 44% of the population is an active Facebook user on a daily basis; this figure increases to 67% on a monthly basis.53 With these statistics in mind and as Facebook's user base expands, it is increasingly easier to claim that a daily Facebook user is also a possible voter, which also addresses criticisms on the stability of the user base (Huberty, 2015). Researchers interested in the use of social media data for electoral forecasting should therefore keep in mind the much broader Facebook user base with respect to other media (Fig. 8).54
Forecasting election results by studying brand importance in online news
2020, International Journal of ForecastingThe many faces of social media in business and economics research: Taking stock of the literature and looking into the future
2024, Journal of Economic Surveys