Skip to main content
Log in

Efficient monitoring of personalized hot news over Web 2.0 streams

  • Special Issue Paper
  • Published:
Computer Science - Research and Development

Abstract

Web 2.0 streams, like blog postings, micro-blogging tweets, or RSS feeds from online communities, offer a wealth of latest news about real-world events and societal discussion. From a user’s perspective, it becomes harder and harder to get a decent overview of recent events, given these massive streams of information that are continuously flowing. Ideally, a system would continuously put together recent information, ranked by the current social impact but also weighted by the users’ personal interests. In this work, we develop methods to meet these requirements. The presented approach continuously tracks the most popular tags attached to the incoming items and based on this, constructs a dynamic top-k query. By continuous evaluation of this query on the incoming stream, we are able to retrieve the currently hottest items. These hottest items are then fed into an engine that re-ranks them w.r.t. user specified interests, given in form of term based topic descriptions. This calls for high performance algorithms for efficient hot document retrieval and subsequently personalizing these documents based on user profiles, given the high rate of incoming data and the immense number of user profiles. In this work we present a combined solution, making use of our prior work on information filtering and showing how it can be used in combination with the current work, on how to continuously determine the hottest documents. To demonstrate the suitability of our approach, we perform a performance evaluation using a real-world dataset obtained from a weblog crawl.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Allan J, Carbonell J, Doddington G, Yamron J, Yang Y (1998a) Topic detection and tracking pilot study final report. Computer Science Department. Carnegie Mellon University. Paper 341. http://repository.cmu.edu/compsci/341

  2. Allan J, Papka R, Lavrenko V (1998b) On-line new event detection and tracking. In: SIGIR, pp 37–45

    Google Scholar 

  3. Alon N, Gibbons PB, Matias Y, Szegedy M (2002) Tracking join and self-join sizes in limited storage. J Comput Syst Sci 64(3):719–747

    Article  MathSciNet  MATH  Google Scholar 

  4. Alvanaki F, Michel S, Ramamritham K, Weikum G (2011) Enblogue—emergent topic detection in Web 2.0 streams. In: SIGMOD conference

    Google Scholar 

  5. Babcock B, Babu S, Datar M, Motwani R, Widom J (2002) Models and issues in data stream systems. In: PODS, pp 1–16

    Google Scholar 

  6. Börzsönyi S, Kossmann D, Stocker K (2001) The skyline operator. In: ICDE, pp 421–430

    Google Scholar 

  7. Calders T, Dexters N, Goethals B (2007) Mining frequent itemsets in a stream. In: ICDM, pp 83–92

    Google Scholar 

  8. Charikar M, Chen K, Farach-Colton M (2004) Finding frequent items in data streams. Theor Comput Sci 312(1):3–15

    Article  MathSciNet  MATH  Google Scholar 

  9. Cormode G, Muthukrishnan S (2003) What’s hot and what’s not: tracking most frequent items dynamically. In: PODS, pp 296–306

    Google Scholar 

  10. Das G, Gunopulos D, Koudas N, Tsirogiannis D (2006) Answering top-k queries using views. In: VLDB, pp 451–462

    Google Scholar 

  11. Das G, Gunopulos D, Koudas N, Sarkas N (2007) Ad-hoc top-k query answering for data streams. In: VLDB, pp 183–194

    Google Scholar 

  12. Fagin R (2002) Combining fuzzy information: an overview. SIGMOD Rec 31(2):109–118

    Article  Google Scholar 

  13. Flajolet P, Martin GN (1985) Probabilistic counting algorithms for data base applications. J Comput Syst Sci 31(2):182–209

    Article  MathSciNet  MATH  Google Scholar 

  14. Flickr, photo sharing: http://www.flickr.com

  15. Haghani P, Michel S, Aberer K (2010) The gist of everything new: personalized top-k processing over Web 2.0 streams. In: CIKM, pp 489–498

    Google Scholar 

  16. Haghani P, Michel S, Aberer K (2011) Tracking hot-k items over Web 2.0 streams. In: BTW, pp 105–122

    Google Scholar 

  17. He Q, Chang K, Lim EP (2007) Analyzing feature trajectories for event detection. In: SIGIR, pp 207–214

    Google Scholar 

  18. Hotho A, Jäschke R, Schmitz C, Stumme G (2006) Trend detection in folksonomies. In: SAMT, pp 56–70

    Google Scholar 

  19. Hristidis V, Koudas N, Papakonstantinou Y (2001) Prefer: a system for the efficient execution of multi-parametric ranked queries. In: SIGMOD conference, pp 259–270

    Google Scholar 

  20. Jin C, Yi K, Yu JX, Lin X (2008) Sliding-window top-k queries on uncertain streams. PVLDB 1(1):301–312

    Google Scholar 

  21. Kleinberg J (2006) Temporal dynamics of on-line information streams. In: Data stream management: processing high-speed data. Springer, Berlin

    Google Scholar 

  22. Kumar R, Novak J, Raghavan P, Tomkins A (2005) On the bursty evolution of blogspace. World Wide Web 8(2):159–178

    Article  Google Scholar 

  23. Kumar R, Punera K, Suel T, Vassilvitskii S (2009) Top-k aggregation using intersections of ranked inputs. In: WSDM, pp 222–231

    Chapter  Google Scholar 

  24. Mathioudakis M, Koudas N (2009) Efficient identification of starters and followers in social media. In: EDBT, pp 708–719

    Chapter  Google Scholar 

  25. Mathioudakis M, Koudas N (2010) Twittermonitor: trend detection over the twitter stream. In: SIGMOD conference, pp 1155–1158

    Chapter  Google Scholar 

  26. Mehlhorn K, Sanders P (2008) Algorithms and data structures: the basic toolbox. Springer, Berlin

    MATH  Google Scholar 

  27. Mouratidis K, Pang H (2009) An incremental threshold method for continuous text search queries. In: ICDE, pp 1187–1190

    Google Scholar 

  28. Mouratidis K, Bakiras S, Papadias D (2006) Continuous monitoring of top-k queries over sliding windows. In: SIGMOD conference, pp 635–646

    Google Scholar 

  29. Muthukrishnan S (2005) Data streams: algorithms and applications. In: Foundations and trends in theoretical computer science. Now Publishers Inc

  30. Yan TW, Garcia-Molina H (1994) Index structures for selective dissemination of information under the boolean model. ACM Trans Database Syst 19(2):332–364

    Article  Google Scholar 

  31. Yi K, Yu H, Yang J, Xia G, Chen Y (2003) Efficient maintenance of materialized top-k views. In: ICDE, pp 189–200

    Google Scholar 

  32. Youtube, broadcast yourself: http://www.youtube.com/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sebastian Michel.

Additional information

This work is partially supported by NCCR-MICS (grant number 5005-67322), the FP7 EU Project OKKAM (contract no. ICT-215032), and the German Research Foundation (DFG) Cluster of Excellence “Multimodal Computing and Interaction” (MMCI).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Haghani, P., Michel, S. & Aberer, K. Efficient monitoring of personalized hot news over Web 2.0 streams. Comput Sci Res Dev 27, 81–92 (2012). https://doi.org/10.1007/s00450-011-0178-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00450-011-0178-9

Keywords

Navigation