ABSTRACT
With public information becoming widely accessible and shared on today's web, greater insights are possible into crowd actions by citizens and non-state actors such as large protests and cyber activism. We present efforts to predict the occurrence, specific timeframe, and location of such actions before they occur based on public data collected from over 300,000 open content web sources in 7 languages, from all over the world, ranging from mainstream news to government publications to blogs and social media. Using natural language processing, event information is extracted from content such as type of event, what entities are involved and in what role, sentiment and tone, and the occurrence time range of the event discussed. Statements made on Twitter about a future date from the time of posting prove particularly indicative. We consider in particular the case of the 2013 Egyptian coup d'état. The study validates and quantifies the common intuition that data on social media (beyond mainstream news sources) are able to predict major events.
- Ahram.org. http://english.ahram.org.eg/NewsContent/1/64/75483/Egypt/Politics-/Egypt-warms-up-for-a-decisive-day-of-anti-and-proM.aspx Egypt warms up for a decisive day of anti- and pro-Morsi protests. Accessed: 2013-08--25.Google Scholar
- S. Asur and B. A. Huberman. Predicting the future with social media. In WI-IAT, 2010. Google ScholarDigital Library
- H. Choi and H. Varian. Predicting the present with google trends. Economic Record, 88(s1):2--9, 2012.Google Scholar
- Z. Da, J. Engelberg, and P. Gao. In search of attention. J. Finance, 66(5):1461--1499, 2011.Google ScholarCross Ref
- S. Goel, J. M. Hofman, S. Lahaie, D. M. Pennock, and D. J. Watts. Predicting consumer behavior with web search. PNAS, 107(41):17486--17490, 2010.Google ScholarCross Ref
- S. González-Bailón, J. Borge-Holthoefer, A. Rivero, and Y. Moreno. The dynamics of protest recruitment through an online network. Sci. Rep., 1, 2011.Google Scholar
- D. Gruhl, L. Chavet, D. Gibson, J. Meyer, P. Pattanayak, A. Tomkins, and J. Zien. How to build a WebFountain: An architecture for very large-scale text analytics. IBM Syst. J., 43(1):64--77, 2004. Google ScholarDigital Library
- D. Gruhl, R. Guha, R. Kumar, J. Novak, and A. Tomkins. The predictive power of online chatter. In SIGKDD, 2005. Google ScholarDigital Library
- N. Kallus. Predicting Crowd Behavior with Big Public Data (preprint with appendix). 2014. http://arxiv.org/abs/1402.2308.Google Scholar
- W. Liaw. Classification and regression by randomForest. R News, 2(3):18--22, 2002.Google Scholar
- J. Nivre, J. Hall, J. Nilsson, A. Chanev, G. Eryigit, S. Kübler, S. Marinov, and E. Marsi. MaltParser: A language-independent system for data-driven dependency parsing. Nat. Lang. Eng., 13(2):95--135, 2007.Google ScholarCross Ref
- NYTimes.com. http://www.nytimes.com/2013/06/10/world/middleeast/protester-dies-in-lebanese-clash-said-to-involve-hezbollah-supporters.html Protester Dies in Clash That Apparently Involved Hezbollah Supporters. Accessed: 2013-08--24.Google Scholar
- R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria, 2013.Google Scholar
- K. Radinsky and E. Horvitz. Mining the web to predict future events. In WSDM, 2013. Google ScholarDigital Library
- Telegraph.co.uk. http://www.telegraph.co.uk/technology/twitter/9945505/Twitter-in-numb%ers.html Twitter in numbers. Accessed: 2013-08--25.Google Scholar
- TheGuardian.com. http://www.theguardian.com/world/video/2013/jun/26/kerry-urges-peace-%egypt-protests-video John Kerry urges peace in Egypt amid anti-government protests. Accessed: 2013-08--25.Google Scholar
- J. Ward. Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc., 58(301):236--244, 1963.Google ScholarCross Ref
Index Terms
- Predicting crowd behavior with big public data
Recommendations
On the predictive power of web intelligence and social media the best way to predict the future is to tweet it
MSM/MUSE/SenseML'14: Proceedings of the 5th and 1st International Conference on Big Data Analytics in the Social and Ubiquitous Context - 5th International Workshop on Modeling Social Media, 5th International Workshop on Mining Ubiquitous and Social Environments and First International Workshop on Machine Learning for Urban Sensor DataWith more information becoming widely accessible and new content created every day on today's web, more are turning to harvesting such data and analyzing it to extract insights. But the relevance of such data to see beyond the present is not clear. We ...
On the Predictive Power of Web Intelligence and Social Media
Revised Selected Papers from the 5th International Workshop on Big Data Analytics in the Social and Ubiquitous Context - Volume 9546With more information becoming widely accessible and new content created every day on today's web, more are turning to harvesting such data and analyzing it to extract insights. But the relevance of such data to see beyond the present is not clear. We ...
Timeline Analysis of Twitter User
AbstractDuring the previous decade, the use of social-media has increased considerably. The social media or social networking sites are used to build consensus among people. Twitter is one of the most famous social networking site or micro-blogging ...
Comments