research-article

Adaptive Evolutionary Filtering in Real-Time Twitter Stream

Authors:
Feifan Fan

Peking University, Beijing, China

Peking University, Beijing, China
View Profile

,
Yansong Feng

Peking University, Beijing, China

Peking University, Beijing, China
View Profile

,
Lili Yao

Peking University, Beijing, China

Peking University, Beijing, China
View Profile

,
Dongyan Zhao

Peking University, Beijing, China

Peking University, Beijing, China
View Profile

CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge ManagementOctober 2016Pages 1079–1088https://doi.org/10.1145/2983323.2983760

Published:24 October 2016Publication History

CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

Pages 1079–1088

ABSTRACT

With the explosive growth of microblogging service, Twitter has become a leading platform consisting of real-time world wide information. Users tend to explore breaking news or general topics in Twitter according to their interests. However, the explosive amount of incoming tweets leads users to information overload. Therefore, filtering interesting tweets based on users' interest profiles from real-time stream can be helpful for users to easily access the relevant and key information hidden among the tweets. On the other hand, real-time twitter stream contains enormous amount of noisy and redundant tweets. Hence, the filtering process should consider previously pushed interesting tweets to provide users with diverse tweets. What's more, different from traditional document summarization methods which focus on static dataset, the twitter stream is dynamic, fast-arriving and large-scale, which means we have to decide whether to filter the coming tweet for users from the real-time stream as early as possible. In this paper, we propose a novel adaptive evolutionary filtering framework to push interesting tweets for users from real-time twitter stream. First, we propose an adaptive evolutionary filtering algorithm to filter interesting tweets from the twitter stream with respect to user interest profiles. And then we utilize the maximal marginal relevance model in fixed time window to estimate the relevance and diversity of potential tweets. Besides, to overcome the enormous number of redundant tweets and characterize the diversity of potential tweets, we propose a hierarchical tweet representation learning model (HTM) to learn the tweet representations dynamically over time. Experiments on large scale real-time twitter stream datasets demonstrate the efficiency and effectiveness of our framework.

References

M. K. Agarwal, K. Ramamritham, and M. Bhide. Real time discovery of dense clusters in highly dynamic graphs: identifying real world events in highly dynamic environments. Proceedings of the VLDB Endowment, 5(10):980--991, 2012. Google ScholarDigital Library
C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A framework for clustering evolving data streams. In Proceedings of the 29th international conference on Very large data bases-Volume 29, pages 81--92. VLDB Endowment, 2003. Google ScholarDigital Library
L. M. Aiello, G. Petkos, C. Martin, D. Corney, S. Papadopoulos, R. Skraba, A. Göker, I. Kompatsiaris, and A. Jaimes. Sensing trending topics in twitter. IEEE Transactions on Multimedia, 15(6):1268--1282, 2013. Google ScholarDigital Library
M. Albakour, C. Macdonald, I. Ounis, et al. On sparsity and drift for effective real-time filtering in microblogs. In CIKM, pages 419--428. ACM, 2013. Google ScholarDigital Library
J. Carbonell and J. Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In SIGIR, pages 335--336. ACM, 1998. Google ScholarDigital Library
C. Chen, F. Li, B. C. Ooi, and S. Wu. Ti: an efficient indexing mechanism for real-time search on tweets. In SIGMOD, pages 649--660. ACM, 2011. Google ScholarDigital Library
N. Djuric, H. Wu, V. Radosavljevic, M. Grbovic, and N. Bhamidipati. Hierarchical neural language models for joint representation of streaming documents and their content. In WWW, pages 248--255. International World Wide Web Conferences Steering Committee, 2015. Google ScholarDigital Library
F. Fan, Y. Fei, C. Lv, L. Yao, J. Yang, and D. Zhao. PKUICST at TREC 2015 Microblog Track. In TREC'15, 2015.Google Scholar
Y. Fei, Y. Hong, and J. Yang. Handling topic drift for topic tracking in microblogs. In Advances in Information Retrieval, pages 477--488. Springer, 2015.Google ScholarCross Ref
J. D. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In SIGIR, pages 111--119, 2001. Google ScholarDigital Library
P. Lee, L. V. Lakshmanan, and E. E. Milios. Incremental cluster evolution tracking from highly dynamic network data. In Data Engineering (ICDE), 2014 IEEE 30th International Conference on, pages 3--14. IEEE, 2014.Google ScholarCross Ref
J. Li, L. Li, and T. Li. Mssf: a multi-document summarization framework based on submodularity. In SIGIR, pages 1247--1248. ACM, 2011. Google ScholarDigital Library
C. Lin, C. Lin, J. Li, D. Wang, Y. Chen, and T. Li. Generating event storylines from microblogs. In CIKM, pages 175--184. ACM, 2012. Google ScholarDigital Library
J. Lin and M. Efron. Overview of the TREC-2013 Microblog Track. In TREC'13, 2014.Google Scholar
J. Lin and M. Efron. Overview of the TREC-2014 Microblog Track. In TREC'14, 2014.Google Scholar
J. Lin, M. Efron, y. Wang, G. Sherman, and E. Voorhees. Overview of the TREC-2015 Microblog Track. In TREC'15, 2015.Google Scholar
Y. Lu, C. Zhai, and N. Sundaresan. Rated aspect summarization of short comments. In Proceedings of the 18th international conference on World wide web, pages 131--140. ACM, 2009. Google ScholarDigital Library
Y. Lv and C. Zhai. A comparative study of methods for estimating query language models with pseudo feedback. In CIKM, pages 1895--1898. ACM, 2009. Google ScholarDigital Library
C. D. Manning, P. Raghavan, H. Schütze, et al. Introduction to information retrieval, volume 1. Cambridge university press Cambridge, 2008. Google ScholarCross Ref
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111--3119, 2013. Google ScholarDigital Library
S. Robertson. Threshold setting and performance optimization in adaptive filtering. Information Retrieval, 5(2--3):239--256, 2002. Google ScholarDigital Library
L. Shou, Z. Wang, K. Chen, and G. Chen. Sumblr: continuous summarization of evolving tweet streams. In SIGIR, pages 533--542. ACM, 2013. Google ScholarDigital Library
B. Sriram, D. Fuhry, E. Demir, H. Ferhatosmanoglu, and M. Demirbas. Short text classification in twitter to improve information filtering. In SIGIR, pages 841--842. ACM, 2010. Google ScholarDigital Library
R. Suwaileh, M. Hasanain, M. Torki, and T. Elsayed. QU at TREC-2015: Building Real-Time Systems for Tweet Filtering and LiveQA. In TREC'15, 2015.Google Scholar
L. Tan, A. Roegiest, and C. L. Clarke. University of Waterloo at TREC 2015 Microblog Track. In TREC'15, 2015.Google Scholar
D. Wang, T. Li, and M. Ogihara. Generating pictorial storylines via minimum-weight connected dominating set approximation in multi-view graphs. In AAAI, 2012. Google ScholarDigital Library
X. Wang and C. Zhai. Learn from web search logs to organize search results. In SIGIR, pages 87--94. ACM, 2007. Google ScholarDigital Library
L. Xia, J. Xu, Y. Lan, J. Guo, and X. Cheng. Learning maximal marginal relevance model via directly optimizing diversity evaluation measures. In SIGIR, pages 113--122. ACM, 2015. Google ScholarDigital Library
T. Xu, P. McNamee, and D. W. Oard. Hltcoe at trec 2014: Microblog and clinical decision support. 2014.Google Scholar
R. Yan, X. Wan, J. Otterbacher, L. Kong, X. Li, and Y. Zhang. Evolutionary timeline summarization: a balanced optimization framework via iterative substitution. In SIGIR, pages 745--754. ACM, 2011. Google ScholarDigital Library
Y. Zhang. Using bayesian priors to combine classifiers for adaptive filtering. In SIGIR, pages 345--352. ACM, 2004. Google ScholarDigital Library
W. Zhou, C. Shen, T. Li, S. Chen, N. Xie, and J. Wei. Generating textual storyline to improve situation awareness in disaster management. In IRI 2014, 2014.Google Scholar
Y. Zhu, Y. Lan, J. Guo, X. Cheng, and S. Niu. Learning for search result diversification. In SIGIR, pages 293--302. ACM, 2014. Google ScholarDigital Library

Index Terms

Adaptive Evolutionary Filtering in Real-Time Twitter Stream
1. Information systems
  1. Information retrieval

Recommendations

Understanding factors that affect response rates in twitter
HT '12: Proceedings of the 23rd ACM conference on Hypertext and social media

In information networks where users send messages to one another, the issue of information overload naturally arises: which are the most important messages? In this paper we study the problem of understanding the importance of messages in Twitter. We ...
Read More
Real-Time Photo Mining from the Twitter Stream: Event Photo Discovery and Food Photo Detection
ISM '14: Proceedings of the 2014 IEEE International Symposium on Multimedia

So many people are posting photos as well as short messages to Twitter every minutes from everywhere on the earth. By monitoring the Twitter stream, we can obtain various kinds of photos with texts. In this paper, as case studies of real-time Twitter ...
Read More
Real-time Filtering on Interest Profiles in Twitter Stream
JCDL '16: Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries

The advent of Twitter has led to the ubiquitous information overload problem with a dramatic increase in the amount of tweets a user is exposed to. In this paper, we consider real-time tweet filtering with respect to users' interest profiles in public ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management
October 2016
2566 pages
ISBN:9781450340731
DOI:10.1145/2983323
General Chairs:
Snehasis Mukhopadhyay
Indiana University Purdue University Indianapolis, USA
,
ChengXiang Zhai
University of Illinois at Urbana-Champaign, USA
,
Program Chairs:
Elisa Bertino
Purdue University
,
Fabio Crestani
University of Lugano
,
Javed Mostafa
University of North Carolina
,
Jie Tang
Tsinghua University
,
Luo Si
Alibaba Group Inc & Purdue University
,
Xiaofang Zhou
University of Queensland
,
Yi Chang
Yahoo Research
,
Yunyao Li
IBM Research - Almaden
,
Parikshit Sondhi
WalmartLabs
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 October 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
evolutionary filtering
timeline
twitter stream
Qualifiers
- research-article
Conference

Acceptance Rates
CIKM '16 Paper Acceptance Rate160of701submissions,23%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 260
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Adaptive Evolutionary Filtering in Real-Time Twitter Stream

CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Understanding factors that affect response rates in twitter

Real-Time Photo Mining from the Twitter Stream: Event Photo Discovery and Food Photo Detection

Real-time Filtering on Interest Profiles in Twitter Stream