ABSTRACT
Twitter (and similar microblogging services) has become a central nexus for discussion of the topics of the day. Twitter data contains rich content and structured information on users' topics of interest and behavior patterns. Correctly analyzing and modeling Twitter data enables the prediction of the user behavior and preference in a variety of practical applications, such as tweet recommendation and followee recommendation. Although a number of models have been developed on Twitter data in prior work, most of these only model the tweets from users, while neglecting their valuable retweet information in the data. Models would enhance their predictive power by incorporating users' retweet content as well as their retweet behavior. In this paper, we propose two novel Bayesian nonparametric models, URM and UCM, on retweet data. Both of them are able to integrate the analysis of tweet text and users' retweet behavior in the same probabilistic framework. Moreover, they both jointly model users' interest in tweet and retweet. As nonparametric models, URM and UCM can automatically determine the parameters of the models based on input data, avoiding arbitrary parameter settings. Extensive experiments on real-world Twitter data show that both URM and UCM are superior to all the baselines, while UCM further outperforms URM, confirming the appropriateness of our models in retweet modeling.
- A. Ahmed and E. P. Xing. Dynamic non-parametric mixture models and the recurrent chinese restaurant process: with applications to evolutionary clustering. In Proceedings of the SIAM International Conference on Data Mining, SDM 2008, April 24--26, 2008, Atlanta, Georgia, USA, pages 219--230, 2008. Google ScholarCross Ref
- C. E. Antoniak. Mixtures of dirichlet processes with applications to bayesian nonparametric problems. The Annals of Statistics, 2(6):1152--1174, 11 1974.Google ScholarCross Ref
- Y. Artzi, P. Pantel, and M. Gamon. Predicting responses to microblog posts. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT '12, pages 602--606, Stroudsburg, PA, USA, 2012. Association for Computational Linguistics. Google ScholarDigital Library
- E. Baralis, T. Cerquitelli, S. Chiusano, L. Grimaudo, and X. Xiao. Analysis of twitter data using a multiple-level clustering strategy. In A. Cuzzocrea and S. Maabout, editors, MEDI, volume 8216 of Lecture Notes in Computer Science, pages 13--24. Springer, 2013. Google ScholarDigital Library
- B. Bi, Y. Tian, Y. Sismanis, A. Balmin, and J. Cho. Scalable topic-specific influence analysis on microblogs. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining, WSDM '14, pages 513--522, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
- D. R. Bild, Y. Liu, R. P. Dick, Z. M. Mao, and D. S. Wallach. Aggregate characterization of user behavior in twitter and analysis of the retweet graph. ACM Trans. Internet Technol., 15(1):4:1--4:24, Mar. 2015. Google ScholarDigital Library
- D. Blackwell and J. B. MacQueen. Ferguson distributions via polya urn schemes. The Annals of Statistics, 1(2):353--355, 03 1973.Google ScholarCross Ref
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, Mar. 2003. Google ScholarDigital Library
- D. Boyd, S. Golder, and G. Lotan. Tweet, tweet, retweet: Conversational aspects of retweeting on twitter. In Proceedings of the 2010 43rd Hawaii International Conference on System Sciences, HICSS '10, pages 1--10, Washington, DC, USA, 2010. IEEE Computer Society. Google ScholarDigital Library
- J. Chang, S. Gerrish, C. Wang, J. L. Boyd-graber, and D. M. Blei. Reading tea leaves: How humans interpret topic models. In Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 288--296. Curran Associates, Inc., 2009.Google Scholar
- W. Chen, Y. Wang, and S. Yang. Efficient influence maximization in social networks. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '09, pages 199--208, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- M. Cheong and V. Lee. Integrating web-based intelligence retrieval and decision-making from the twitter trends knowledge base. In Proceedings of the 2Nd ACM Workshop on Social Web Search and Mining, SWSM '09, pages 1--8, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- G. Comarela, M. Crovella, V. Almeida, and F. Benevenuto. Understanding factors that affect response rates in twitter. In Proceedings of the 23rd ACM Conference on Hypertext and Social Media, HT '12, pages 123--132, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- Z. Dai, A. Sun, and X.-Y. Liu. Crest: Cluster-based representation enrichment for short text classification. In J. Pei, V. S. Tseng, L. Cao, H. Motoda, and G. Xu, editors, PAKDD (2), volume 7819 of Lecture Notes in Computer Science, pages 256--267. Springer, 2013.Google Scholar
- T. S. Ferguson. A Bayesian Analysis of Some Nonparametric Problems. The Annals of Statistics, 1(2):209--230, 1973. Google ScholarCross Ref
- K. Gimpel, N. Schneider, B. O'Connor, D. Das, D. Mills, J. Eisenstein, M. Heilman, D. Yogatama, J. Flanigan, and N. A. Smith. Part-of-speech tagging for twitter: Annotation, features, and experiments. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers - Volume 2, HLT '11, pages 42--47, Stroudsburg, PA, USA, 2011. Association for Computational Linguistics. Google ScholarDigital Library
- T. H. Haveliwala. Topic-sensitive pagerank: A context-sensitive ranking algorithm for web search. IEEE Trans. on Knowl. and Data Eng., 15(4):784--796, July 2003. Google ScholarDigital Library
- L. Hong, O. Dan, and B. D. Davison. Predicting popular messages in twitter. In Proceedings of the 20th International Conference Companion on World Wide Web, WWW '11, pages 57--58, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- D. Kempe, J. Kleinberg, and E. Tardos. Maximizing the spread of influence through a social network. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03, pages 137--146, New York, NY, USA, 2003. ACM. Google ScholarDigital Library
- K. W. Lim, C. Chen, and W. Buntine. Twitter-Network topic model: A full bayesian treatment for social network and text modeling. In NIPS2013 Topic Model workshop, page 4, Australia, Dec 2013.Google Scholar
- S. A. Macskassy and M. Michelson. Why do people retweet? anti-homophily wins the day! In L. A. Adamic, R. A. Baeza-Yates, and S. Counts, editors, ICWSM. The AAAI Press, 2011.Google Scholar
- M. Mathioudakis and N. Koudas. Twittermonitor: Trend detection over the twitter stream. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD '10, pages 1155--1158, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- R. M. Neal. Markov chain sampling methods for dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9(2):249--265, 2000. Google ScholarCross Ref
- P. Orbanz and Y. W. Teh. Bayesian nonparametric models. In Encyclopedia of Machine Learning. Springer, 2010.Google Scholar
- I. Porteous. Networks of Mixture Blocks for Non Parametric Bayesian Models with Applications. PhD thesis, Long Beach, CA, USA, 2010. AAI3403449. Google ScholarDigital Library
- C. P. Robert and G. Casella. Monte Carlo Statistical Methods (Springer Texts in Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2005. Google Scholar
- K. D. Rosa, R. Shah, B. Lin, A. Gershman, and R. Frederking. Topical Clustering of Tweets. Proceedings of the ACM SIGIR: SWSM, 2011.Google Scholar
- J. Sethuraman. A constructive definition of Dirichlet priors. Statistica Sinica, 4:639--650, 1994.Google Scholar
- Y. W. Teh and M. I. Jordan. Hierarchical Bayesian nonparametric models with applications. In N. Hjort, C. Holmes, P. Müller, and S. Walker, editors, Bayesian Nonparametrics: Principles and Practice. Cambridge University Press, 2010. Google ScholarCross Ref
- Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical dirichlet processes. Journal of the American Statistical Association, 101(476):pp. 1566--1581, 2006. Google ScholarCross Ref
- M. J. Welch, U. Schonfeld, D. He, and J. Cho. Topical semantics of twitter links. In Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, WSDM '11, pages 327--336, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- J. Weng, E.-P. Lim, J. Jiang, and Q. He. Twitterrank: Finding topic-sensitive influential twitterers. In Proceedings of the Third ACM International Conference on Web Search and Data Mining, WSDM '10, pages 261--270, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- J. Yang and S. Counts. Predicting the speed, scale, and range of information diffusion in Twitter. In 4th International AAAI Conference on Weblogs and Social Media (ICWSM), May 2010.Google Scholar
- Z. Yang, J. Guo, K. Cai, J. Tang, J. Li, L. Zhang, and Z. Su. Understanding retweeting behaviors in social networks. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM '10, pages 1633--1636, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- T. R. Zaman, R. Herbrich, J. V. Gael, and D. Stern. Predicting information spreading in twitter. In Computational Social Science and the Wisdom of Crowds Workshop (colocated with NIPS 2010), December 2010.Google Scholar
- W. X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan, and X. Li. Comparing twitter and traditional media using topic models. In Proceedings of the 33rd European Conference on Advances in Information Retrieval, ECIR'11, pages 338--349, Berlin, Heidelberg, 2011. Springer-Verlag. Google ScholarDigital Library
Index Terms
- Modeling a Retweet Network via an Adaptive Bayesian Approach
Recommendations
Who will retweet me?: finding retweeters in twitter
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrievalAn important aspect of communication in Twitter (and other Social Network is message propagation -- people creating posts for others to share. Although there has been work on modelling how tweets in Twitter are propagated (retweeted), an untackled ...
Retweet Behavior Prediction in Twitter
ISCID '14: Proceedings of the 2014 Seventh International Symposium on Computational Intelligence and Design - Volume 02Retweet, as a main way to spread information in twitter, has been researched in a number of works. Recently research focuses on analyzing the factors of retweet behavior. However, the prediction on retweet behavior is a new challenge which is not well ...
Bad news travel fast: a content-based analysis of interestingness on Twitter
WebSci '11: Proceedings of the 3rd International Web Science ConferenceOn the microblogging site Twitter, users can forward any message they receive to all of their followers. This is called a retweet and is usually done when users find a message particularly interesting and worth sharing with others. Thus, retweets reflect ...
Comments