How to Catch L 2-Heavy-Hitters on Sliding Windows

Braverman, Vladimir; Gelles, Ran; Ostrovsky, Rafail

doi:10.1007/978-3-642-38768-5_56

Vladimir Braverman¹⁸,
Ran Gelles¹⁹ &
Rafail Ostrovsky^19,20

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7936))

Included in the following conference series:

International Computing and Combinatorics Conference

1818 Accesses
6 Citations

Abstract

Finding heavy-elements (heavy-hitters) in streaming data is one of the central, and well-understood tasks. Despite the importance of this problem, when considering the sliding windows model of streaming (where elements eventually expire) the problem of finding L ₂-heavy elements has remained completely open despite multiple papers and considerable success in finding L ₁-heavy elements.

Since the L ₂-heavy element problem doesn’t satisfy certain conditions, existing methods for sliding windows algorithms, such as smooth histograms or exponential histograms are not directly applicable to it. In this paper, we develop the first polylogarithmic-memory algorithm for finding L ₂-heavy elements in the sliding window model.

Our technique allows us not only to find L ₂-heavy elements, but also heavy elements with respect to any L _p with 0 < p ≤ 2 on sliding windows. By this we completely “close the gap” and resolve the question of finding L _p-heavy elements in the sliding window model with polylogarithmic memory, since it is well known that for p > 2 this task is impossible.

We demonstrate a broader applicability of our method on two additional examples: we show how to obtain a sliding window approximation of the similarity of two streams, and of the fraction of elements that appear exactly a specified number of times within the window (the α-rarity problem). In these two illustrative examples of our method, we replace the current expected memory bounds with worst case bounds.

A preliminary full version of this paper appears online [10].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, C.C.: Data streams: models and algorithms. Springer, New York (2007)
MATH Google Scholar
Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. Journal of Computer and System Sciences 58(1), 137–147 (1999)
Article MathSciNet MATH Google Scholar
Arasu, A., Manku, G.S.: Approximate counts and quantiles over sliding windows. In: PODS 2004, pp. 286–296 (2004)
Google Scholar
Bandi, N., Agrawal, D., Abbadi, A.E.: Fast algorithms for heavy distinct hitters using associative memories. In: ICDSC 2007, p. 6 (2007)
Google Scholar
Bar-Yossef, Z., Jayram, T.S., Kumar, R., Sivakumar, D., Trevisan, L.: Counting distinct elements in a data stream. In: Rolim, J.D.P., Vadhan, S.P. (eds.) RANDOM 2002. LNCS, vol. 2483, pp. 1–10. Springer, Heidelberg (2002)
Chapter Google Scholar
Bar-Yossef, Z., Jayram, T.S., Kumar, R., Sivakumar, D.: An information statistics approach to data stream and communication complexity. In: FOCS 2002, pp. 209–218 (2002)
Google Scholar
Bar-Yossef, Z., Kumar, R., Sivakumar, D.: Reductions in streaming algorithms, with an application to counting triangles in graphs. In: SODA 2002, pp. 623–632 (2002)
Google Scholar
Bhuvanagiri, L., Ganguly, S., Kesh, D., Saha, C.: Simpler algorithm for estimating frequency moments of data streams. In: SODA 2006, pp. 708–713 (2006)
Google Scholar
Braverman, V., Ostrovsky, R.: Smooth histograms for sliding windows. In: FOCS 2007, pp. 283–293 (2007)
Google Scholar
Braverman, V., Gelles, R., Ostrovsky, R.: How to catch L ₂-heavy-hitters on sliding windows (2010), http://arxiv.org/abs/1012.3130
Broder, A.Z., Charikar, M., Frieze, A.M., Mitzenmacher, M.: Min-wise independent permutations. Journal of Computer and System Sciences 60(3), 630–659 (2000)
Article MathSciNet MATH Google Scholar
Broder, A.Z., Glassman, S.C., Manasse, M.S., Zweig, G.: Syntactic clustering of the web. Computer Networks and ISDN Systems 29(8-13), 1157–1166 (1997)
Article Google Scholar
Broder, A.: On the resemblance and containment of documents. In: Proceedings of the Compression and Complexity of Sequences 1997, pp. 21–29 (1997)
Google Scholar
Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 693–703. Springer, Heidelberg (2002)
Chapter Google Scholar
Cohen, E.: Size-estimation framework with applications to transitive closure and reachability. Journal of Computer and System Sciences 55(3), 441–453 (1997)
Article MathSciNet MATH Google Scholar
Cohen, E., Strauss, M.J.: Maintaining time-decaying stream aggregates. Journal of Algorithms 59(1), 19–36 (2006)
Article MathSciNet MATH Google Scholar
Cormode, G., Muthukrishnan, S.: An improved data stream summary: The count-min sketch and its applications. In: Farach-Colton, M. (ed.) LATIN 2004. LNCS, vol. 2976, pp. 29–38. Springer, Heidelberg (2004)
Chapter Google Scholar
Cormode, G., Hadjieleftheriou, M.: Finding frequent items in data streams. Proc. VLDB Endow. 1(2), 1530–1541 (2008)
Google Scholar
Cormode, G., Korn, F., Muthukrishnan, S., Srivastava, D.: Finding hierarchical heavy hitters in data streams. In: VLDB 2003, pp. 464–475 (2003)
Google Scholar
Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: tracking most frequent items dynamically. ACM Trans. Database Syst. 30(1), 249–278 (2005)
Article MathSciNet Google Scholar
Datar, M., Muthukrishnan, S.: Estimating rarity and similarity over data stream windows. In: Möhring, R.H., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 323–334. Springer, Heidelberg (2002)
Chapter Google Scholar
Datar, M., Gionis, A., Indyk, P., Motwani, R.: Maintaining stream statistics over sliding windows (extended abstract). In: SODA 2002, pp. 635–644 (2002)
Google Scholar
Demaine, E.D., López-Ortiz, A., Munro, J.I.: Frequency estimation of internet packet streams with limited space. In: Möhring, R.H., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 348–360. Springer, Heidelberg (2002)
Chapter Google Scholar
Estan, C., Varghese, G.: New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice. ACM Trans. Comput. Syst. 21(3), 270–313 (2003)
Article Google Scholar
Flajolet, P., Martin, G.N.: Probabilistic counting. In: FOCS 1983, pp. 76–82 (1983)
Google Scholar
Gibbons, P.B., Tirthapura, S.: Estimating simple functions on the union of data streams. In: SPAA 2001, pp. 281–291 (2001)
Google Scholar
Golab, L., DeHaan, D., Demaine, E.D., López-Ortiz, A., Munro, J.I.: Identifying frequent items in sliding windows over on-line packet streams. In: IMC 2003, pp. 173–178 (2003)
Google Scholar
Hung, R.Y.S., Ting, H.F.: Finding heavy hitters over the sliding window of a weighted data stream. In: Laber, E.S., Bornstein, C., Nogueira, L.T., Faria, L. (eds.) LATIN 2008. LNCS, vol. 4957, pp. 699–710. Springer, Heidelberg (2008)
Chapter Google Scholar
Hung, R.Y., Lee, L.K., Ting, H.: Finding frequent items over sliding windows with constant update time. Information Processing Letters 110(7), 257–260 (2010)
Article MathSciNet MATH Google Scholar
Indyk, P.: A small approximately min-wise independent family of hash functions. In: SODA 1999, pp. 454–456 (1999)
Google Scholar
Indyk, P.: Heavy hitters and sparse approximations, lecture notes (2009), http://people.csail.mit.edu/indyk/Rice/lec4.pdf
Indyk, P., Woodruff, D.: Optimal approximations of the frequency moments of data streams. In: STOC 2005, pp. 202–208 (2005)
Google Scholar
Jin, C., Qian, W., Sha, C., Yu, J.X., Zhou, A.: Dynamically maintaining frequent items over a data stream. In: CIKM 2003, pp. 287–294 (2003)
Google Scholar
Kane, D.M., Nelson, J., Woodruff, D.P.: An optimal algorithm for the distinct elements problem. In: PODS 2010, pp. 41–52 (2010)
Google Scholar
Karp, R.M., Shenker, S., Papadimitriou, C.H.: A simple algorithm for finding frequent elements in streams and bags. ACM Trans. Database Syst. 28, 51–55 (2003)
Article Google Scholar
Lee, L.K., Ting, H.F.: A simpler and more efficient deterministic scheme for finding frequent items over sliding windows. In: PODS 2006, pp. 290–297 (2006)
Google Scholar
Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: VLDB 2002, pp. 346–357 (2002)
Google Scholar
Open problems in data streams and related topics. IITK Workshop on Algrithms for Data Streams 2006 (2006), compiled and edited by McGregor, A.
Google Scholar
Metwally, A., Agrawal, D., El Abbadi, A.: Efficient computation of frequent and top-k elements in data streams. In: Eiter, T., Libkin, L. (eds.) ICDT 2005. LNCS, vol. 3363, pp. 398–412. Springer, Heidelberg (2005)
Chapter Google Scholar
Muthukrishnan, S.: Data streams: Algorithms and applications. Now Publishers Inc. (2005)
Google Scholar
Nie, G., Lu, Z.: Approximate frequency counts in sliding window over data stream. In: Canadian Conference on Electrical and Computer Engineering, pp. 2232–2236 (2005)
Google Scholar
Saks, M., Sun, X.: Space lower bounds for distance approximation in the data stream model. In: STOC 2002, pp. 360–369 (2002)
Google Scholar
Sen, S., Wang, J.: Analyzing peer-to-peer traffic across large networks. In: IMW 2002, pp. 137–150. ACM (2002)
Google Scholar
Tirthapura, S., Woodruff, D.P.: A general method for estimating correlated aggregates over a data stream. In: International Conference on Data Engineering, pp. 162–173 (2012)
Google Scholar
Zhang, L., Guan, Y.: Frequency estimation over sliding windows. In: International Conference on Data Engineering, pp. 1385–1387 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Johns Hopkins University, USA
Vladimir Braverman
Department of Computer Science, University of California, Los Angeles, USA
Ran Gelles & Rafail Ostrovsky
Department of Mathematics, University of California, Los Angeles, USA
Rafail Ostrovsky

Authors

Vladimir Braverman
View author publications
You can also search for this author in PubMed Google Scholar
Ran Gelles
View author publications
You can also search for this author in PubMed Google Scholar
Rafail Ostrovsky
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Texas at Dallas, 75080, Richardson, TX, USA
Ding-Zhu Du
College of Computer Science and Technology, Zhejiang University, 310027, Hangzhou, China
Guochuan Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Braverman, V., Gelles, R., Ostrovsky, R. (2013). How to Catch L ₂-Heavy-Hitters on Sliding Windows. In: Du, DZ., Zhang, G. (eds) Computing and Combinatorics. COCOON 2013. Lecture Notes in Computer Science, vol 7936. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38768-5_56

Download citation

DOI: https://doi.org/10.1007/978-3-642-38768-5_56
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38767-8
Online ISBN: 978-3-642-38768-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics