Elsevier

International Journal of Forecasting

Volume 31, Issue 3, July–September 2015, Pages 992-1007
International Journal of Forecasting

Can we vote with our tweet? On the perennial difficulty of election forecasting with social media

https://doi.org/10.1016/j.ijforecast.2014.08.005Get rights and content

Abstract

Social media and other “big” data promise new sources of information for tracking and forecasting electoral contests in democratic societies. This paper discusses the use of social media, and Twitter in particular, for forecasting elections in the United States, Germany, and other democracies. All known forecasting methods based on social media have failed when subjected to the demands of true forward-looking electoral forecasting. These failures appear to be due to fundamental properties of social media, rather than to methodological or algorithmic difficulties. In short, social media do not, and probably never will, offer a stable, unbiased, representative picture of the electorate; and convenience samples of social media lack sufficient data to fix these problems post hoc. Hence, while these services may, as others in this volume discuss, offer new ways of reaching prospective voters, the data that they generate will not replace polling as a means of assessing the sentiment or intentions of the electorate.

Introduction

The virtualization of human behavior has generated enormous quantities of data about what people do or say, to whom, and when. The presence of such data has naturally given rise to the desire to study past behavior, and ultimately, to forecast the future. High-profile commercial successes in advertising, sales, logistics, and other fields hint that it may be possible to fulfill this desire. If so, this offers a potential solution to political professionals and scholars who are struggling with the increasing cost and difficulty of traditional survey research and electoral polling.

However, our experience to date in forecasting important events, political or otherwise, has not always borne out the promise of new predictive powers. Success in predicting what people will want to buy, or what ads they might click on, has not translated into reliable predictions in sectors such as public health or finance. Research into other means of online opinion discovery, such as consumer rating systems, has shown them to be vulnerable to systematic bias. These difficulties have cast doubts on the potential for social media data to provide a new means of forecasting high-profile real-world events.

This paper discusses the history of electoral forecasting from social media and other online data. I note that all known examples of apparently successful forecasts have very quickly encountered problems that have undermined their predictive power. Their initial success at beating simple heuristics for electoral success, such as incumbency, has faded quickly when faced with the challenge of forward-looking election prediction. These difficulties have led to high-profile claims that social media data will soon supplant polling having a short shelf life. Hence, while social media and other online data may be useful for studying how citizens behave online, they have proven rather less useful for forecasting what those same citizens–and, more importantly, their offline fellows–will do back in the real world.

I conclude by discussing whether these problems can be overcome. I agree with earlier critiques of the election forecasting literature, which have recommended fundamental changes in the transparency, disclosure, method, and publication of electoral forecasts that rely on social media. However, I raise the question of whether we should ever expect reliable election forecasts even after such changes. Because social media forecasting lacks the control over the data-generating process which is enjoyed by polling, it faces enormous instability in the origin, sample, structure, and content of the data on which it relies. This instability should undermine easy claims that social media-based forecasts will replace polls any time soon. Instead, social media may only prove one more channel for the application of traditional polling methods, rather than a revolution in social prediction.

Section snippets

Forecasting the political

From what began as a catalogue of human knowledge, the internet has become an instrument for monitoring human behavior. Using the data from that instrument, new companies have successfully measured and predicted a range of online and offline behaviors. The most notable successes include radically more effective ad targeting, translation, and email spam detection (Google), the discovery and recommendation of professional (LinkedIn) and social (Facebook) contacts, early detection of and marketing

A multi-cycle example of failure

The author can now add his own experience to the catalog of failed attempts at election prediction. Over the 2010–2012 election cycle, we performed what we believe to be the first multi-cycle experiment in forecasting elections in the US House of Representatives. The sampling strategy was designed to circumvent some of the problems inherent in using the Streaming API. We made and disclosed forecasts for the 2012 House elections in real time during the campaign, based on algorithms and data

Towards better political forecasts

Given this pattern of failure, and the thorny root causes thereof, what might improve the odds of building successful social media-based forecasts of election outcomes?  Gayo-Avello (2012) attempted to define a baseline set of criteria for serious election forecasts from Twitter data. As a minimum, they recommend:

  • 1.

    Forecasters should have a well-defined hypothesis as to why their method should work.

  • 2.

    Ahead of time, forecasters should define the right baseline–usually the rate of incumbent

Conclusions: on the difficulty of forecasting

Headline successes in forecasting the offline world from online data have relied on events that are atomic, exchangeable, frequent, and low-risk. When forecasting models have departed from these characteristics–as with Google Flu or attempts at stock market forecasting–they have often run into serious difficulties. The assumption of a relatively stable link between online and offline behavior appears to work reasonably well when forecasting purchasing behaviors, estimating advertising

Acknowledgments

Enormous credit and thanks are due to Len DeGroot of the Graduate School of Journalism at the University of California, Berkeley, for hosting the real-time publication of predictions during the 2012 election; and to Hillary Sanders for invaluable research support. Earlier versions of this paper benefited from comments and feedback from F. Daniel Hidalgo, Jasjeet Sekhon, and participants at the 2011 UC Berkeley Research Workshop in American politics. Chris Diehl, Dave Gutelius, Stu Feldman, and

References (50)

  • W. Wang et al.

    Forecasting elections with non-representative polls

    International Journal of Forecasting

    (2015)
  • Barberá, P. (2012). Birds of the same feather tweet together: Bayesian ideal point estimation using Twitter data. APSA...
  • Beauchamp, N. (2013). Predicting and interpolating state-level polling using Twitter textual data. Working paper,...
  • A. Bermingham et al.

    On using Twitter to monitor political sentiment and predict election results

    Sentiment Analysis where AI meets Psychology (SAAIP)

    (2011)
  • A. Binns

    Don’t feed the trolls! Managing troublemakers in magazines’ online communities

    Journalism Practice

    (2012)
  • D. Blei et al.

    Latent Dirichlet Allocation

    The Journal of Machine Learning Research

    (2003)
  • V.D. Blondel et al.

    Fast unfolding of communities in large networks

    Journal of Statistical Mechanics: Theory and Experiment

    (2008)
  • R.M. Bond et al.

    A 61-million-person experiment in social influence and political mobilization

    Nature

    (2012)
  • J. Boyle et al.

    Sampling cell phone only households: A comparison of demographic and behavioral characteristics from ABS and cell phone samples

    Survey Practice

    (2013)
  • D.E. Broockman et al.

    Do online advertisements increase political candidates’ name recognition or favorability? Evidence from randomized field experiments

    Political Behavior

    (2013)
  • D. Butler

    When Google got flu wrong

    Nature

    (2013)
  • D. Caughey et al.

    Elections and the regression discontinuity design: Lessons from close US House races, 1942–2008

    Political Analysis

    (2011)
  • H. Choi et al.

    Predicting the present with Google trends

    Economic Record

    (2012)
  • Christian, L., Keeter, S., Purcell, K., & Smith, A. (2010). Assessing the cell phone challenge to survey research in...
  • Conover, M. D., Ratkiewicz, J., Francisco, M., Goncalves, B., Flammini, A., & Menczer, F. (2011). Political...
  • Copeland, P., Romano, R., Zhang, T., Hecht, G., Zigmond, D., & Stefansen, C. (2013). Google disease trends: an update....
  • DiGrazia, J., McKelvey, K., Bollen, J., & Rojas, F. (2013). More tweets, more votes: Social media as a quantitative...
  • Gayo-Avello, D. (2012). I wanted to predict elections with Twitter and all I got was this lousy paper: a balanced...
  • Gayo-Avello, D., Metaxas, P. T., & Mustafaraj, E. (2011). Limits of electoral predictions using Twitter. In Proceedings...
  • J. Ginsberg et al.

    Detecting influenza epidemics using search engine query data

    Nature

    (2008)
  • J. Grimmer et al.

    Text as data: The promise and pitfalls of automatic content analysis methods for political texts

    Political Analysis

    (2013)
  • Hill, K. (2012). How Target figured out a teen girl was pregnant before her father did. Forbes.com, 16...
  • Huberty, M. (2013). Multi-cycle forecasting of congressional elections with social media. In Workshop on politics,...
  • A. Jungherr et al.

    Why the Pirate Party won the German election of 2009 or the trouble with predictions: A response to Tumasjan, A., Sprenger, T.O., Sander, P.G., & Welpe, I.M. “Predicting elections with Twitter: What 140 characters reveal about political sentiment”

    Social Science Computer Review

    (2012)
  • S. Keeter et al.

    Gauging the impact of growing nonresponse on estimates from a national RDD telephone survey

    Public Opinion Quarterly

    (2006)
  • Cited by (64)

    • Forecasting: theory and practice

      2022, International Journal of Forecasting
    • Political preferences nowcasting with factor analysis and internet data: The 2012 and 2016 US presidential elections

      2021, Technological Forecasting and Social Change
      Citation Excerpt :

      In the US, at least 44% of the population is an active Facebook user on a daily basis; this figure increases to 67% on a monthly basis.53 With these statistics in mind and as Facebook's user base expands, it is increasingly easier to claim that a daily Facebook user is also a possible voter, which also addresses criticisms on the stability of the user base (Huberty, 2015). Researchers interested in the use of social media data for electoral forecasting should therefore keep in mind the much broader Facebook user base with respect to other media (Fig. 8).54

    View all citing articles on Scopus
    View full text