ABSTRACT
Many previous techniques identify trending topics in social media, even topics that are not pre-defined. We present a technique to identify trending rumors, which we define as topics that include disputed factual claims. Putting aside any attempt to assess whether the rumors are true or false, it is valuable to identify trending rumors as early as possible. It is extremely difficult to accurately classify whether every individual post is or is not making a disputed factual claim. We are able to identify trending rumors by recasting the problem as finding entire clusters of posts whose topic is a disputed factual claim.
The key insight is that when there is a rumor, even though most posts do not raise questions about it, there may be a few that do. If we can find signature text phrases that are used by a few people to express skepticism about factual claims and are rarely used to express anything else, we can use those as detectors for rumor clusters. Indeed, we have found a few phrases that seem to be used exactly that way, including: "Is this true?", "Really?", and "What?". Relatively few posts related to any particular rumor use any of these enquiry phrases, but lots of rumor diffusion processes have some posts that do and have them quite early in the diffusion.
We have developed a technique based on searching for the enquiry phrases, clustering similar posts together, and then collecting related posts that do not contain these simple phrases. We then rank the clusters by their likelihood of really containing a disputed factual claim. The detector, which searches for the very rare but very informative phrases, combined with clustering and a classifier on the clusters, yields surprisingly good performance. On a typical day of Twitter, about a third of the top 50 clusters were judged to be rumors, a high enough precision that human analysts might be willing to sift through them.
- L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen. Classification and regression trees. CRC press, 1984.Google Scholar
- A. Z. Broder. On the resemblance and containment of documents. In Compression and Complexity of Sequences 1997. Proceedings, pages 21--29. IEEE, 1997. Google ScholarDigital Library
- R. Caruana and A. Niculescu-Mizil. An empirical comparison of supervised learning algorithms. In Proceedings of the 23rd international conference on Machine learning, pages 161--168. ACM, 2006. Google ScholarDigital Library
- C. Castillo, M. Mendoza, and B. Poblete. Information credibility on twitter. In Proceedings of the 20th international conference on World wide web, pages 675--684. ACM, 2011. Google ScholarDigital Library
- C.-C. Chang and C.-J. Lin. Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3):27, 2011. Google ScholarDigital Library
- E. H. Chi. Information seeking can be social. IEEE Computer, 42(3):42--46, 2009. Google ScholarDigital Library
- C. Cortes and V. Vapnik. Support-vector networks. Machine learning, 20(3):273--297, 1995. Google ScholarDigital Library
- I. Dagan, O. Glickman, and B. Magnini. The pascal recognising textual entailment challenge. In Machine learning challenges. evaluating predictive uncertainty, visual object classification, and recognising tectual entailment, pages 177--190. Springer, 2006. Google ScholarDigital Library
- N. DiFonzo and P. Bordia. Rumor psychology: Social and organizational approaches. American Psychological Association, 2007.Google ScholarCross Ref
- P. Domm. False rumor of explosion at white house causes stocks to briefly plunge; ap confirms its twitter feed was hacked., April 2013.Google Scholar
- G. Erkan and D. R. Radev. Lexrank: graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, pages 457--479, 2004. Google ScholarDigital Library
- G. Forman. An extensive empirical study of feature selection metrics for text classification. The Journal of machine learning research, 3:1289--1305, 2003. Google ScholarDigital Library
- A. Friggeri, L. A. Adamic, D. Eckles, and J. Cheng. Rumor cascades. In Proceedings of the Eighth International AAAI Conference on Weblogs and Social Media, 2014.Google Scholar
- A. Gupta and P. Kumaraguru. Credibility ranking of tweets during high impact events. In Proceedings of the 1st Workshop on Privacy and Security in Online Social Media, page 2. ACM, 2012. Google ScholarDigital Library
- A. Gupta, H. Lamba, and P. Kumaraguru. $1.00 per rt# bostonmarathon# prayforboston: Analyzing fake content on twitter. In eCrime Researchers Summit (eCRS), 2013, pages 1--12. IEEE, 2013.Google Scholar
- A. Gupta, H. Lamba, P. Kumaraguru, and A. Joshi. Faking sandy: characterizing and identifying fake images on twitter during hurricane sandy. In Proceedings of the 22nd international conference on World Wide Web companion, pages 729--736. International World Wide Web Conferences Steering Committee, 2013. Google ScholarDigital Library
- S. Kwon, M. Cha, K. Jung, W. Chen, and Y. Wang. Prominent features of rumor propagation in online social media. In Data Mining (ICDM), 2013 IEEE 13th International Conference on, pages 1103--1108. IEEE, 2013.Google ScholarCross Ref
- J. Leskovec, L. Backstrom, and J. Kleinberg. Meme-tracking and the dynamics of the news cycle. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 497--506. ACM, 2009. Google ScholarDigital Library
- J. MacQueen et al. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, volume 1, pages 281--297. Oakland, CA, USA., 1967.Google Scholar
- M. Mathioudakis and N. Koudas. Twittermonitor: trend detection over the twitter stream. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 1155--1158. ACM, 2010. Google ScholarDigital Library
- M. Mendoza, B. Poblete, and C. Castillo. Twitter under crisis: Can we trust what we rt? In Proceedings of the first workshop on social media analytics, pages 71--79. ACM, 2010. Google ScholarDigital Library
- M. R. Morris, S. Counts, A. Roseway, A. Hoff, and J. Schwarz. Tweeting is believing?: understanding microblog credibility perceptions. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work, pages 441--450. ACM, 2012. Google ScholarDigital Library
- M. R. Morris, J. Teevan, and K. Panovich. A comparison of information seeking using search engines and social networks. ICWSM, 10:23--26, 2010.Google Scholar
- M. R. Morris, J. Teevan, and K. Panovich. What do people ask their social networks, and why?: a survey study of status message q&a behavior. In Proceedings of the SIGCHI conference on Human factors in computing systems, pages 1739--1748. ACM, 2010. Google ScholarDigital Library
- S. A. Paul, L. Hong, and E. H. Chi. Is twitter a good place for asking questions? a characterization study. In ICWSM, 2011.Google Scholar
- S. C. Pendleton. Rumor research revisited and expanded. Language & Communication, 18(1):69--86, 1998.Google ScholarCross Ref
- M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130--137, 1980.Google ScholarCross Ref
- V. Qazvinian, E. Rosengren, D. R. Radev, and Q. Mei. Rumor has it: Identifying misinformation in microblogs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 1589--1599. Association for Computational Linguistics, 2011. Google ScholarDigital Library
- J. Ratkiewicz, M. Conover, M. Meiss, B. Gonçalves, A. Flammini, and F. Menczer. Detecting and tracking political abuse in social media. In ICWSM, 2011.Google Scholar
- J. Ratkiewicz, M. Conover, M. Meiss, B. Gonçalves, S. Patil, A. Flammini, and F. Menczer. Truthy: mapping the spread of astroturf in microblog streams. In Proceedings of the 20th international conference companion on World wide web, pages 249--252. ACM, 2011. Google ScholarDigital Library
- R. L. Rosnow. Inside rumor: A personal journey. American Psychologist, 46(5):484, 1991.Google ScholarCross Ref
- E. Seo, P. Mohapatra, and T. Abdelzaher. Identifying rumors and their sources in social networks. In SPIE Defense, Security, and Sensing, pages 83891I--83891I. International Society for Optics and Photonics, 2012.Google Scholar
- S. Sun, H. Liu, J. He, and X. Du. Detecting event rumors on sina weibo automatically. In Web Technologies and Applications, pages 120--131. Springer, 2013.Google ScholarCross Ref
- T. Takahashi and N. Igata. Rumor detection on twitter. In Soft Computing and Intelligent Systems (SCIS) and 13th International Symposium on Advanced Intelligent Systems (ISIS), 2012 Joint 6th International Conference on, pages 452--457. IEEE, 2012.Google ScholarCross Ref
- F. Yang, Y. Liu, X. Yu, and M. Yang. Automatic detection of rumor on sina weibo. In Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics, page 13. ACM, 2012. Google ScholarDigital Library
- J. Yang, M. R. Morris, J. Teevan, L. A. Adamic, and M. S. Ackerman. Culture matters: A survey study of social q&a behavior. ICWSM, 11:409--416, 2011.Google Scholar
- Z. Zhao and Q. Mei. Questions about questions: An empirical analysis of information needs on twitter. In Proceedings of the 22nd international conference on World Wide Web, pages 1545--1556. International World Wide Web Conferences Steering Committee, 2013. Google ScholarDigital Library
Index Terms
- Enquiring Minds: Early Detection of Rumors in Social Media from Enquiry Posts
Recommendations
Automatic detection of rumor on Sina Weibo
MDS '12: Proceedings of the ACM SIGKDD Workshop on Mining Data SemanticsThe problem of gauging information credibility on social networks has received considerable attention in recent years. Most previous work has chosen Twitter, the world's largest micro-blogging platform, as the premise of research. In this work, we shift ...
Which cascade is more decisive in rumor detection on social media: Based on comparison between repost and reply sequences
AbstractRumor detection research is widely carried out to control the negative impact of rumor spreading. Many researchers conduct their research based on data from social media and prefer to employ different information cascades on social media, such as ...
Highlights- Proposing CSRD, a rumor detection model, using modified dilated convolution.
- Revealing feature differences in rumor detection between reposts and replies
- Considering detection deadlines’ impact on data exposure levels in early ...
An Annotated Chinese Corpus for Rumor Veracity Detection
Artificial Intelligence and Mobile Services – AIMS 2020AbstractWith the popularity of social media, Twitter, Facebook, and Weibo etc. platforms have become an indispensable part of people’s life, where users can freely release and spread information. Meanwhile, the information credibility cannot be guaranteed ...
Comments