skip to main content
10.1145/2983323.2983760acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Adaptive Evolutionary Filtering in Real-Time Twitter Stream

Published:24 October 2016Publication History

ABSTRACT

With the explosive growth of microblogging service, Twitter has become a leading platform consisting of real-time world wide information. Users tend to explore breaking news or general topics in Twitter according to their interests. However, the explosive amount of incoming tweets leads users to information overload. Therefore, filtering interesting tweets based on users' interest profiles from real-time stream can be helpful for users to easily access the relevant and key information hidden among the tweets. On the other hand, real-time twitter stream contains enormous amount of noisy and redundant tweets. Hence, the filtering process should consider previously pushed interesting tweets to provide users with diverse tweets. What's more, different from traditional document summarization methods which focus on static dataset, the twitter stream is dynamic, fast-arriving and large-scale, which means we have to decide whether to filter the coming tweet for users from the real-time stream as early as possible. In this paper, we propose a novel adaptive evolutionary filtering framework to push interesting tweets for users from real-time twitter stream. First, we propose an adaptive evolutionary filtering algorithm to filter interesting tweets from the twitter stream with respect to user interest profiles. And then we utilize the maximal marginal relevance model in fixed time window to estimate the relevance and diversity of potential tweets. Besides, to overcome the enormous number of redundant tweets and characterize the diversity of potential tweets, we propose a hierarchical tweet representation learning model (HTM) to learn the tweet representations dynamically over time. Experiments on large scale real-time twitter stream datasets demonstrate the efficiency and effectiveness of our framework.

References

  1. M. K. Agarwal, K. Ramamritham, and M. Bhide. Real time discovery of dense clusters in highly dynamic graphs: identifying real world events in highly dynamic environments. Proceedings of the VLDB Endowment, 5(10):980--991, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A framework for clustering evolving data streams. In Proceedings of the 29th international conference on Very large data bases-Volume 29, pages 81--92. VLDB Endowment, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. M. Aiello, G. Petkos, C. Martin, D. Corney, S. Papadopoulos, R. Skraba, A. Göker, I. Kompatsiaris, and A. Jaimes. Sensing trending topics in twitter. IEEE Transactions on Multimedia, 15(6):1268--1282, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Albakour, C. Macdonald, I. Ounis, et al. On sparsity and drift for effective real-time filtering in microblogs. In CIKM, pages 419--428. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Carbonell and J. Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In SIGIR, pages 335--336. ACM, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Chen, F. Li, B. C. Ooi, and S. Wu. Ti: an efficient indexing mechanism for real-time search on tweets. In SIGMOD, pages 649--660. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. N. Djuric, H. Wu, V. Radosavljevic, M. Grbovic, and N. Bhamidipati. Hierarchical neural language models for joint representation of streaming documents and their content. In WWW, pages 248--255. International World Wide Web Conferences Steering Committee, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. F. Fan, Y. Fei, C. Lv, L. Yao, J. Yang, and D. Zhao. PKUICST at TREC 2015 Microblog Track. In TREC'15, 2015.Google ScholarGoogle Scholar
  9. Y. Fei, Y. Hong, and J. Yang. Handling topic drift for topic tracking in microblogs. In Advances in Information Retrieval, pages 477--488. Springer, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  10. J. D. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In SIGIR, pages 111--119, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Lee, L. V. Lakshmanan, and E. E. Milios. Incremental cluster evolution tracking from highly dynamic network data. In Data Engineering (ICDE), 2014 IEEE 30th International Conference on, pages 3--14. IEEE, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  12. J. Li, L. Li, and T. Li. Mssf: a multi-document summarization framework based on submodularity. In SIGIR, pages 1247--1248. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. Lin, C. Lin, J. Li, D. Wang, Y. Chen, and T. Li. Generating event storylines from microblogs. In CIKM, pages 175--184. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Lin and M. Efron. Overview of the TREC-2013 Microblog Track. In TREC'13, 2014.Google ScholarGoogle Scholar
  15. J. Lin and M. Efron. Overview of the TREC-2014 Microblog Track. In TREC'14, 2014.Google ScholarGoogle Scholar
  16. J. Lin, M. Efron, y. Wang, G. Sherman, and E. Voorhees. Overview of the TREC-2015 Microblog Track. In TREC'15, 2015.Google ScholarGoogle Scholar
  17. Y. Lu, C. Zhai, and N. Sundaresan. Rated aspect summarization of short comments. In Proceedings of the 18th international conference on World wide web, pages 131--140. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Y. Lv and C. Zhai. A comparative study of methods for estimating query language models with pseudo feedback. In CIKM, pages 1895--1898. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. D. Manning, P. Raghavan, H. Schütze, et al. Introduction to information retrieval, volume 1. Cambridge university press Cambridge, 2008. Google ScholarGoogle ScholarCross RefCross Ref
  20. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111--3119, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Robertson. Threshold setting and performance optimization in adaptive filtering. Information Retrieval, 5(2--3):239--256, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. L. Shou, Z. Wang, K. Chen, and G. Chen. Sumblr: continuous summarization of evolving tweet streams. In SIGIR, pages 533--542. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. B. Sriram, D. Fuhry, E. Demir, H. Ferhatosmanoglu, and M. Demirbas. Short text classification in twitter to improve information filtering. In SIGIR, pages 841--842. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. Suwaileh, M. Hasanain, M. Torki, and T. Elsayed. QU at TREC-2015: Building Real-Time Systems for Tweet Filtering and LiveQA. In TREC'15, 2015.Google ScholarGoogle Scholar
  25. L. Tan, A. Roegiest, and C. L. Clarke. University of Waterloo at TREC 2015 Microblog Track. In TREC'15, 2015.Google ScholarGoogle Scholar
  26. D. Wang, T. Li, and M. Ogihara. Generating pictorial storylines via minimum-weight connected dominating set approximation in multi-view graphs. In AAAI, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. X. Wang and C. Zhai. Learn from web search logs to organize search results. In SIGIR, pages 87--94. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. L. Xia, J. Xu, Y. Lan, J. Guo, and X. Cheng. Learning maximal marginal relevance model via directly optimizing diversity evaluation measures. In SIGIR, pages 113--122. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. T. Xu, P. McNamee, and D. W. Oard. Hltcoe at trec 2014: Microblog and clinical decision support. 2014.Google ScholarGoogle Scholar
  30. R. Yan, X. Wan, J. Otterbacher, L. Kong, X. Li, and Y. Zhang. Evolutionary timeline summarization: a balanced optimization framework via iterative substitution. In SIGIR, pages 745--754. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Y. Zhang. Using bayesian priors to combine classifiers for adaptive filtering. In SIGIR, pages 345--352. ACM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. W. Zhou, C. Shen, T. Li, S. Chen, N. Xie, and J. Wei. Generating textual storyline to improve situation awareness in disaster management. In IRI 2014, 2014.Google ScholarGoogle Scholar
  33. Y. Zhu, Y. Lan, J. Guo, X. Cheng, and S. Niu. Learning for search result diversification. In SIGIR, pages 293--302. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Adaptive Evolutionary Filtering in Real-Time Twitter Stream

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management
          October 2016
          2566 pages
          ISBN:9781450340731
          DOI:10.1145/2983323

          Copyright © 2016 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 24 October 2016

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          CIKM '16 Paper Acceptance Rate160of701submissions,23%Overall Acceptance Rate1,861of8,427submissions,22%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader