ABSTRACT
Twitter has evolved from being a conversation or opinion sharing medium among friends into a platform to share and disseminate information about current events. Events in the real world create a corresponding spur of posts (tweets) on Twitter. Not all content posted on Twitter is trustworthy or useful in providing information about the event. In this paper, we analyzed the credibility of information in tweets corresponding to fourteen high impact news events of 2011 around the globe. From the data we analyzed, on average 30% of total tweets posted about an event contained situational information about the event while 14% was spam. Only 17% of the total tweets posted about the event contained situational awareness information that was credible. Using regression analysis, we identified the important content and sourced based features, which can predict the credibility of information in a tweet. Prominent content based features were number of unique characters, swear words, pronouns, and emoticons in a tweet, and user based features like the number of followers and length of username. We adopted a supervised machine learning and relevance feedback approach using the above features, to rank tweets according to their credibility score. The performance of our ranking algorithm significantly enhanced when we applied re-ranking strategy. Results show that extraction of credible information from Twitter can be automated with high confidence.
- F. Benevenuto, G. Magno, T. Rodrigues, and V. Almeida. Detecting spammers on Twitter. In CEAS, 2010.Google Scholar
- C. Buckley, G. Salton, and J. Allan. Automatic retrieval with locality information using SMART. NIST special publication, (500207):59--72, 1993.Google Scholar
- K. R. Canini, B. Suh, and P. L. Pirolli. Finding credible information sources in social networks based on content and social structure. In SocialCom, 2011.Google ScholarCross Ref
- C. Castillo, M. Mendoza, and B. Poblete. Information Credibility on Twitter. In WWW, pages 675--684, 2011. Google ScholarDigital Library
- J. Chen, R. Nairn, L. Nelson, M. Bernstein, and E. Chi. Short and tweet: experiments on recommending content from information streams. CHI '10, pages 1185--1194, 2010. Google ScholarDigital Library
- S. Chhabra, A. Aggarwal, F. Benevenuto, and P. Kumaraguru. Phi.sh/$ocial: the phishing landscape through short urls. CEAS 2011, pages 92--101, 2011. Google ScholarDigital Library
- B. De Longueville, R. S. Smith, and G. Luraschi. "omg, from here, i can see the flames!": a use case of mining location based social networks to acquire spatio-temporal data on forest fires, LBSN, 2009. Google ScholarDigital Library
- A. Dong, R. Zhang, P. Kolari, J. Bai, F. Diaz, Y. Chang, Z. Zheng, and H. Zha. Time is of the essence: improving recency ranking using twitter data. WWW '10. Google ScholarDigital Library
- Y. Duan, L. Jiang, T. Qin, M. Zhou, and H.-Y. Shum. An empirical study on learning to rank of tweets. In COLING '10. Google ScholarDigital Library
- C. Grier, K. Thomas, V. Paxson, and M. Zhang. @spam: the underground on 140 characters or less. In Proceedings of the 17th ACM conference on Computer and communications security, 2010. Google ScholarDigital Library
- A. Gupta and P. Kumaraguru. Twitter explodes with activity in mumbai blasts! a lifeline or an unmonitored daemon in the lurking? IIIT, Delhi, Technical report, IIITD-TR-2011-005, 2011.Google Scholar
- A. l. Hughes and L. Palen. Twitter adoption and use in mass convergence and emergency events. In Proceedings of the 2009 ISCRAM Conference, 2009.Google ScholarCross Ref
- A. L. Hughes and L. Palen. Twitter adoption and use in crisis twitter adoption and use in mass convergence and emergency events. In ISCRAM, 2010.Google Scholar
- K. Järvelin and J. Kekäläinen. Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems, 20:2002, 2002. Google ScholarDigital Library
- T. Joachims. Optimizing search engines using clickthrough data. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pages 133--142, 2002. Google ScholarDigital Library
- H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? WWW '10, 2010. Google ScholarDigital Library
- J. R. Landis and G. G. Koch. The Measurement of Observer Agreement for Categorical Data. Biometrics, 33(1):159--174, Mar. 1977.Google ScholarCross Ref
- M. Mendoza, B. Poblete, and C. Castillo. In SOMA, July.Google Scholar
- B. O'Connor, R. Balasubramanyan, B. R. Routledge, and N. A. Smith. From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series. In Proceedings of the International AAAI Conference on Weblogs and Social Media, 2010.Google Scholar
- O. Oh, M. Agrawal, and H. R. Rao. Information control and terrorism: Tracking the mumbai terrorist attack through twitter. Information Systems Frontiers, 13(1):33--43, 2011. Google ScholarDigital Library
- L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.Google Scholar
- J. Ratkiewicz, M. Conover, M. Meiss, B. Gonçalves, S. Patil, A. Flammini, and F. Menczer. Truthy: mapping the spread of astroturf in microblog streams. WWW '11. Google ScholarDigital Library
- S. E. Robertson, S. Walker, and M. Beaulieu. Okapi at trec-7: automatic ad hoc, filtering, vlc and interactive track. IN, 1999.Google Scholar
- T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes twitter users: real-time event detection by social sensors. WWW '10, 2010. Google ScholarDigital Library
- S. Verma, S. Vieweg, W. J. Corvey, L. Palen, J. H. Martin, M. Palmer, A. Schram, and K. M. Anderson. Nlp to the rescue? extracting "situational awareness" tweets during mass emergency. ICWSM, 2011.Google Scholar
- S. Vieweg, A. L. Hughes, K. Starbird, and L. Palen. Microblogging during two natural hazards events: what twitter may contribute to situational awareness. In CHI, CHI '10, pages 1079--1088, 2010. Google ScholarDigital Library
- S. Yardi, D. Romero, G. Schoenebeck, and D. Boyd. Detecting spam in a Twitter network. First Monday, 15(1), Jan. 2010.Google Scholar
- W. X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan, and X. Li. Comparing twitter and traditional media using topic models. In ECIR'11. Google ScholarDigital Library
Index Terms
- Credibility ranking of tweets during high impact events
Recommendations
Credibility-inspired ranking for blog post retrieval
Credibility of information refers to its believability or the believability of its sources. We explore the impact of credibility-inspired indicators on the task of blog post retrieval, following the intuition that more credible blog posts are preferred ...
Credibility in Context: An Analysis of Feature Distributions in Twitter
SOCIALCOM-PASSAT '12: Proceedings of the 2012 ASE/IEEE International Conference on Social Computing and 2012 ASE/IEEE International Conference on Privacy, Security, Risk and TrustTwitter is a major forum for rapid dissemination of user-provided content in real time. As such, a large proportion of the information it contains is not particularly relevant to many users and in fact is perceived as unwanted 'noise' by many. There has ...
Blog credibility ranking by exploiting verified content
WICOW '09: Proceedings of the 3rd workshop on Information credibility on the webPeople use weblogs to express thoughts, present ideas and share knowledge. However, weblogs can also be misused to influence and manipulate the readers. Therefore the credibility of a blog has to be validated before the available information is used for ...
Comments