Skip to main content

How to Catch L 2-Heavy-Hitters on Sliding Windows

  • Conference paper
Computing and Combinatorics (COCOON 2013)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7936))

Included in the following conference series:

Abstract

Finding heavy-elements (heavy-hitters) in streaming data is one of the central, and well-understood tasks. Despite the importance of this problem, when considering the sliding windows model of streaming (where elements eventually expire) the problem of finding L 2-heavy elements has remained completely open despite multiple papers and considerable success in finding L 1-heavy elements.

Since the L 2-heavy element problem doesn’t satisfy certain conditions, existing methods for sliding windows algorithms, such as smooth histograms or exponential histograms are not directly applicable to it. In this paper, we develop the first polylogarithmic-memory algorithm for finding L 2-heavy elements in the sliding window model.

Our technique allows us not only to find L 2-heavy elements, but also heavy elements with respect to any L p with 0 < p ≤ 2 on sliding windows. By this we completely “close the gap” and resolve the question of finding L p -heavy elements in the sliding window model with polylogarithmic memory, since it is well known that for p > 2 this task is impossible.

We demonstrate a broader applicability of our method on two additional examples: we show how to obtain a sliding window approximation of the similarity of two streams, and of the fraction of elements that appear exactly a specified number of times within the window (the α-rarity problem). In these two illustrative examples of our method, we replace the current expected memory bounds with worst case bounds.

A preliminary full version of this paper appears online [10].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.C.: Data streams: models and algorithms. Springer, New York (2007)

    MATH  Google Scholar 

  2. Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. Journal of Computer and System Sciences 58(1), 137–147 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  3. Arasu, A., Manku, G.S.: Approximate counts and quantiles over sliding windows. In: PODS 2004, pp. 286–296 (2004)

    Google Scholar 

  4. Bandi, N., Agrawal, D., Abbadi, A.E.: Fast algorithms for heavy distinct hitters using associative memories. In: ICDSC 2007, p. 6 (2007)

    Google Scholar 

  5. Bar-Yossef, Z., Jayram, T.S., Kumar, R., Sivakumar, D., Trevisan, L.: Counting distinct elements in a data stream. In: Rolim, J.D.P., Vadhan, S.P. (eds.) RANDOM 2002. LNCS, vol. 2483, pp. 1–10. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  6. Bar-Yossef, Z., Jayram, T.S., Kumar, R., Sivakumar, D.: An information statistics approach to data stream and communication complexity. In: FOCS 2002, pp. 209–218 (2002)

    Google Scholar 

  7. Bar-Yossef, Z., Kumar, R., Sivakumar, D.: Reductions in streaming algorithms, with an application to counting triangles in graphs. In: SODA 2002, pp. 623–632 (2002)

    Google Scholar 

  8. Bhuvanagiri, L., Ganguly, S., Kesh, D., Saha, C.: Simpler algorithm for estimating frequency moments of data streams. In: SODA 2006, pp. 708–713 (2006)

    Google Scholar 

  9. Braverman, V., Ostrovsky, R.: Smooth histograms for sliding windows. In: FOCS 2007, pp. 283–293 (2007)

    Google Scholar 

  10. Braverman, V., Gelles, R., Ostrovsky, R.: How to catch L 2-heavy-hitters on sliding windows (2010), http://arxiv.org/abs/1012.3130

  11. Broder, A.Z., Charikar, M., Frieze, A.M., Mitzenmacher, M.: Min-wise independent permutations. Journal of Computer and System Sciences 60(3), 630–659 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  12. Broder, A.Z., Glassman, S.C., Manasse, M.S., Zweig, G.: Syntactic clustering of the web. Computer Networks and ISDN Systems 29(8-13), 1157–1166 (1997)

    Article  Google Scholar 

  13. Broder, A.: On the resemblance and containment of documents. In: Proceedings of the Compression and Complexity of Sequences 1997, pp. 21–29 (1997)

    Google Scholar 

  14. Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 693–703. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  15. Cohen, E.: Size-estimation framework with applications to transitive closure and reachability. Journal of Computer and System Sciences 55(3), 441–453 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  16. Cohen, E., Strauss, M.J.: Maintaining time-decaying stream aggregates. Journal of Algorithms 59(1), 19–36 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  17. Cormode, G., Muthukrishnan, S.: An improved data stream summary: The count-min sketch and its applications. In: Farach-Colton, M. (ed.) LATIN 2004. LNCS, vol. 2976, pp. 29–38. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  18. Cormode, G., Hadjieleftheriou, M.: Finding frequent items in data streams. Proc. VLDB Endow. 1(2), 1530–1541 (2008)

    Google Scholar 

  19. Cormode, G., Korn, F., Muthukrishnan, S., Srivastava, D.: Finding hierarchical heavy hitters in data streams. In: VLDB 2003, pp. 464–475 (2003)

    Google Scholar 

  20. Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: tracking most frequent items dynamically. ACM Trans. Database Syst. 30(1), 249–278 (2005)

    Article  MathSciNet  Google Scholar 

  21. Datar, M., Muthukrishnan, S.: Estimating rarity and similarity over data stream windows. In: Möhring, R.H., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 323–334. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  22. Datar, M., Gionis, A., Indyk, P., Motwani, R.: Maintaining stream statistics over sliding windows (extended abstract). In: SODA 2002, pp. 635–644 (2002)

    Google Scholar 

  23. Demaine, E.D., López-Ortiz, A., Munro, J.I.: Frequency estimation of internet packet streams with limited space. In: Möhring, R.H., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 348–360. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  24. Estan, C., Varghese, G.: New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice. ACM Trans. Comput. Syst. 21(3), 270–313 (2003)

    Article  Google Scholar 

  25. Flajolet, P., Martin, G.N.: Probabilistic counting. In: FOCS 1983, pp. 76–82 (1983)

    Google Scholar 

  26. Gibbons, P.B., Tirthapura, S.: Estimating simple functions on the union of data streams. In: SPAA 2001, pp. 281–291 (2001)

    Google Scholar 

  27. Golab, L., DeHaan, D., Demaine, E.D., López-Ortiz, A., Munro, J.I.: Identifying frequent items in sliding windows over on-line packet streams. In: IMC 2003, pp. 173–178 (2003)

    Google Scholar 

  28. Hung, R.Y.S., Ting, H.F.: Finding heavy hitters over the sliding window of a weighted data stream. In: Laber, E.S., Bornstein, C., Nogueira, L.T., Faria, L. (eds.) LATIN 2008. LNCS, vol. 4957, pp. 699–710. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  29. Hung, R.Y., Lee, L.K., Ting, H.: Finding frequent items over sliding windows with constant update time. Information Processing Letters 110(7), 257–260 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  30. Indyk, P.: A small approximately min-wise independent family of hash functions. In: SODA 1999, pp. 454–456 (1999)

    Google Scholar 

  31. Indyk, P.: Heavy hitters and sparse approximations, lecture notes (2009), http://people.csail.mit.edu/indyk/Rice/lec4.pdf

  32. Indyk, P., Woodruff, D.: Optimal approximations of the frequency moments of data streams. In: STOC 2005, pp. 202–208 (2005)

    Google Scholar 

  33. Jin, C., Qian, W., Sha, C., Yu, J.X., Zhou, A.: Dynamically maintaining frequent items over a data stream. In: CIKM 2003, pp. 287–294 (2003)

    Google Scholar 

  34. Kane, D.M., Nelson, J., Woodruff, D.P.: An optimal algorithm for the distinct elements problem. In: PODS 2010, pp. 41–52 (2010)

    Google Scholar 

  35. Karp, R.M., Shenker, S., Papadimitriou, C.H.: A simple algorithm for finding frequent elements in streams and bags. ACM Trans. Database Syst. 28, 51–55 (2003)

    Article  Google Scholar 

  36. Lee, L.K., Ting, H.F.: A simpler and more efficient deterministic scheme for finding frequent items over sliding windows. In: PODS 2006, pp. 290–297 (2006)

    Google Scholar 

  37. Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: VLDB 2002, pp. 346–357 (2002)

    Google Scholar 

  38. Open problems in data streams and related topics. IITK Workshop on Algrithms for Data Streams 2006 (2006), compiled and edited by McGregor, A.

    Google Scholar 

  39. Metwally, A., Agrawal, D., El Abbadi, A.: Efficient computation of frequent and top-k elements in data streams. In: Eiter, T., Libkin, L. (eds.) ICDT 2005. LNCS, vol. 3363, pp. 398–412. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  40. Muthukrishnan, S.: Data streams: Algorithms and applications. Now Publishers Inc. (2005)

    Google Scholar 

  41. Nie, G., Lu, Z.: Approximate frequency counts in sliding window over data stream. In: Canadian Conference on Electrical and Computer Engineering, pp. 2232–2236 (2005)

    Google Scholar 

  42. Saks, M., Sun, X.: Space lower bounds for distance approximation in the data stream model. In: STOC 2002, pp. 360–369 (2002)

    Google Scholar 

  43. Sen, S., Wang, J.: Analyzing peer-to-peer traffic across large networks. In: IMW 2002, pp. 137–150. ACM (2002)

    Google Scholar 

  44. Tirthapura, S., Woodruff, D.P.: A general method for estimating correlated aggregates over a data stream. In: International Conference on Data Engineering, pp. 162–173 (2012)

    Google Scholar 

  45. Zhang, L., Guan, Y.: Frequency estimation over sliding windows. In: International Conference on Data Engineering, pp. 1385–1387 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Braverman, V., Gelles, R., Ostrovsky, R. (2013). How to Catch L 2-Heavy-Hitters on Sliding Windows. In: Du, DZ., Zhang, G. (eds) Computing and Combinatorics. COCOON 2013. Lecture Notes in Computer Science, vol 7936. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38768-5_56

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38768-5_56

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38767-8

  • Online ISBN: 978-3-642-38768-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics