ABSTRACT
The increasing popularity of Twitter renders improved trust- worthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweets? content alone. We present a novel ranking method called RAProp, which combines two orthogonal measures of relevance and trustworthiness of a tweet. The first, called Feature Score, measures the trustworthiness of the source of the tweet by extracting features from a 3-layer Twitter ecosystem consisting of users, tweets and webpages. The second measure, called agreement analysis, estimates the trustworthiness of the content of a tweet by analyzing whether the content is independently corroborated by other tweets. We view the candidate result set of tweets as the vertices of a graph, with the edges measuring the estimated agreement between each pair of tweets. The feature score is propagated over this agreement graph to compute the top-k tweets that have both trustworthy sources and independent corroboration. The evaluation of our method on 16 million tweets from the TREC 2011 Microblog Dataset shows that for top-30 precision, we achieve 53% better precision than the current best performing method on the data set, and an improvement of 300% over current Twitter Search.
Supplemental Material
Available for Download
Latex folder with tex file and all Figures
- M.-A. Abbasi and H. Liu. Measuring user credibility in social media. In Social Computing, Behavioral-Cultural Modeling and Prediction, pages 441--448. Springer, 2013. Google ScholarDigital Library
- Twitter speaks, markets listen and fears rise. http://nyti.ms/ZuoSkj.Google Scholar
- R. Baeza-Yates, C. Castillo, V. López, and C. Telefónica. Pagerank increase under different collusion topologies. In Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), pages 17--24, 2005.Google Scholar
- R. Balakrishnan and S. Kambhampati. Sourcerank: Relevance and trust assessment for deep web sources based on inter-source agreement. In Proceedings of WWW, 2011. Google ScholarDigital Library
- L. Breiman. Random forests. Machine learning, 45(1):5--32, 2001. Google ScholarDigital Library
- S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Computer networks and ISDN systems, pages 107--117, 1998. Google ScholarDigital Library
- C. Castillo, M. Mendoza, and B. Poblete. Information credibility on twitter. In Proceedings of WWW, 2011. Google ScholarDigital Library
- J. Choi, B. Croft, and J. K. Kim. Quality models for microblog retrieval. In Proceedings of CIKM, 2012. Google ScholarDigital Library
- W. Cohen, P. Ravikumar, and S. Fienberg. A comparison of string distance metrics for name-matching tasks. In Proceedings of IIWeb, pages 73--78, 2003.Google Scholar
- K. Gimpel, N. Schneider, B. O'Connor, D. Das, D. Mills, J. Eisenstein, M. Heilman, D. Yogatama, J. Flanigan, and N. A. Smith. Part-of-speech tagging for twitter: annotation, features, and experiments. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers-Volume 2, pages 42--47. Association for Computational Linguistics, 2011. Google ScholarDigital Library
- M. Gupta and J. Han. Heterogeneous network-based trust analysis: a survey. ACM SIGKDD Explorations, pages 54--71, 2011. Google ScholarDigital Library
- M. Gupta, P. Zhao, and J. Han. Evaluating event credibility on twitter. In SMD, 2012.Google ScholarCross Ref
- Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with trustrank. In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30, pages 576--587. VLDB Endowment, 2004. Google ScholarDigital Library
- L. Jabeur, L. Tamine, and M. Boughanem. Featured tweet search: Modeling time and social influence for microblog retrieval. In IEEE/WIC/ACM International Conference on Web Intelligence, 2012. Google ScholarDigital Library
- J. Jiang, L. Hidayah, T. Elsayed, and H. Ramadan. Best of kaust at trec-2011: Building effective search in twitter. In Proceedings of the 20th Text REtrieval Conference (TREC 2011), 2012.Google Scholar
- R. McCreadie and C. Macdonald. Relevance in microblogs: Enhancing tweet retrieval using hyperlinked documents. 2012.Google Scholar
- D. Metzler and C. Cai. Usc/isi at trec 2011: Microblog track. In Proceedings of the Text REtrieval Conference (TREC 2011), 2011.Google Scholar
- R. Nagmoti, A. Teredesai, and M. De Cock. Ranking approaches for microblog search. In Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on, volume 1, pages 153--157, 31 2010-sept. 3 2010. Google ScholarDigital Library
- Twitter death hoaxes, alive and sadly, well. http://nyti.ms/10qVW9j.Google Scholar
- Trec 2011 microblog track. http://trec.nist.gov/data/tweets/.Google Scholar
- S. Ravikumar. RAProp: Ranking Tweets by Exploiting the Tweet/User/Web Ecosystem. PhD thesis, ARIZONA STATE UNIVERSITY, 2013.Google Scholar
- R. Socher, E. H. Huang, J. Pennington, A. Y. Ng, and C. D. Manning. Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. Advances in Neural Information Processing Systems, 24:801--809, 2011.Google Scholar
- J. Teevan, D. Ramage, and M. R. Morris.#twittersearch: a comparison of microblog search and web search. In Proceedings of the fourth ACM international conference on Web search and data mining, pages 35--44. ACM, 2011. Google ScholarDigital Library
- Zombie followers and fake re-tweets. http://www.economist.com/node/21550333.Google Scholar
- State of twitter spam. http://bit.ly/d5PLDO.Google Scholar
- About top search results. http://bit.ly/IYssaa.Google Scholar
- Y. Yamaguchi, T. Takahashi, T. Amagasa, and H. Kitagawa. Turank: Twitter user ranking based on user-tweet graph analysis. In Web Information Systems Engineering--WISE 2010, pages 240--253. Springer, 2010. Google ScholarDigital Library
- M. Yang, J. Lee, S. Lee, and H. Rim. Finding interesting posts in twitter based on retweet graph analysis. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, pages 1073--1074. ACM, 2012. Google ScholarDigital Library
Index Terms
- RAProp: ranking tweets by exploiting the tweet/user/web ecosystem and inter-tweet agreement
Recommendations
Analyzing and predicting viral tweets
WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide WebTwitter and other microblogging services have become indispensable sources of information in today's web. Understanding the main factors that make certain pieces of information spread quickly in these platforms can be decisive for the analysis of ...
Analysis of Microblog Rumors and Correction Texts for Disaster Situations
IIWAS '13: Proceedings of International Conference on Information Integration and Web-based Applications & ServicesMicroblogging systems such as Twitter have become popular. They are especially useful and helpful for users in disaster situations. Microblogs have facilitated the spread of information of all kinds, even rumors. Rumors block adequate information ...
Hashtag retrieval in a microblogging environment
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrievalMicroblog services let users broadcast brief textual messages to people who "follow" their activity. Often these posts contain terms called hashtags, markers of a post's meaning, audience, etc. This poster treats the following problem: given a user's ...
Comments