Abstract
Finding heavy-elements (heavy-hitters) in streaming data is one of the central, and well-understood tasks. Despite the importance of this problem, when considering the sliding windows model of streaming (where elements eventually expire) the problem of finding L 2-heavy elements has remained completely open despite multiple papers and considerable success in finding L 1-heavy elements.
Since the L 2-heavy element problem doesn’t satisfy certain conditions, existing methods for sliding windows algorithms, such as smooth histograms or exponential histograms are not directly applicable to it. In this paper, we develop the first polylogarithmic-memory algorithm for finding L 2-heavy elements in the sliding window model.
Our technique allows us not only to find L 2-heavy elements, but also heavy elements with respect to any L p with 0 < p ≤ 2 on sliding windows. By this we completely “close the gap” and resolve the question of finding L p -heavy elements in the sliding window model with polylogarithmic memory, since it is well known that for p > 2 this task is impossible.
We demonstrate a broader applicability of our method on two additional examples: we show how to obtain a sliding window approximation of the similarity of two streams, and of the fraction of elements that appear exactly a specified number of times within the window (the α-rarity problem). In these two illustrative examples of our method, we replace the current expected memory bounds with worst case bounds.
A preliminary full version of this paper appears online [10].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.C.: Data streams: models and algorithms. Springer, New York (2007)
Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. Journal of Computer and System Sciences 58(1), 137–147 (1999)
Arasu, A., Manku, G.S.: Approximate counts and quantiles over sliding windows. In: PODS 2004, pp. 286–296 (2004)
Bandi, N., Agrawal, D., Abbadi, A.E.: Fast algorithms for heavy distinct hitters using associative memories. In: ICDSC 2007, p. 6 (2007)
Bar-Yossef, Z., Jayram, T.S., Kumar, R., Sivakumar, D., Trevisan, L.: Counting distinct elements in a data stream. In: Rolim, J.D.P., Vadhan, S.P. (eds.) RANDOM 2002. LNCS, vol. 2483, pp. 1–10. Springer, Heidelberg (2002)
Bar-Yossef, Z., Jayram, T.S., Kumar, R., Sivakumar, D.: An information statistics approach to data stream and communication complexity. In: FOCS 2002, pp. 209–218 (2002)
Bar-Yossef, Z., Kumar, R., Sivakumar, D.: Reductions in streaming algorithms, with an application to counting triangles in graphs. In: SODA 2002, pp. 623–632 (2002)
Bhuvanagiri, L., Ganguly, S., Kesh, D., Saha, C.: Simpler algorithm for estimating frequency moments of data streams. In: SODA 2006, pp. 708–713 (2006)
Braverman, V., Ostrovsky, R.: Smooth histograms for sliding windows. In: FOCS 2007, pp. 283–293 (2007)
Braverman, V., Gelles, R., Ostrovsky, R.: How to catch L 2-heavy-hitters on sliding windows (2010), http://arxiv.org/abs/1012.3130
Broder, A.Z., Charikar, M., Frieze, A.M., Mitzenmacher, M.: Min-wise independent permutations. Journal of Computer and System Sciences 60(3), 630–659 (2000)
Broder, A.Z., Glassman, S.C., Manasse, M.S., Zweig, G.: Syntactic clustering of the web. Computer Networks and ISDN Systems 29(8-13), 1157–1166 (1997)
Broder, A.: On the resemblance and containment of documents. In: Proceedings of the Compression and Complexity of Sequences 1997, pp. 21–29 (1997)
Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 693–703. Springer, Heidelberg (2002)
Cohen, E.: Size-estimation framework with applications to transitive closure and reachability. Journal of Computer and System Sciences 55(3), 441–453 (1997)
Cohen, E., Strauss, M.J.: Maintaining time-decaying stream aggregates. Journal of Algorithms 59(1), 19–36 (2006)
Cormode, G., Muthukrishnan, S.: An improved data stream summary: The count-min sketch and its applications. In: Farach-Colton, M. (ed.) LATIN 2004. LNCS, vol. 2976, pp. 29–38. Springer, Heidelberg (2004)
Cormode, G., Hadjieleftheriou, M.: Finding frequent items in data streams. Proc. VLDB Endow. 1(2), 1530–1541 (2008)
Cormode, G., Korn, F., Muthukrishnan, S., Srivastava, D.: Finding hierarchical heavy hitters in data streams. In: VLDB 2003, pp. 464–475 (2003)
Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: tracking most frequent items dynamically. ACM Trans. Database Syst. 30(1), 249–278 (2005)
Datar, M., Muthukrishnan, S.: Estimating rarity and similarity over data stream windows. In: Möhring, R.H., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 323–334. Springer, Heidelberg (2002)
Datar, M., Gionis, A., Indyk, P., Motwani, R.: Maintaining stream statistics over sliding windows (extended abstract). In: SODA 2002, pp. 635–644 (2002)
Demaine, E.D., López-Ortiz, A., Munro, J.I.: Frequency estimation of internet packet streams with limited space. In: Möhring, R.H., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 348–360. Springer, Heidelberg (2002)
Estan, C., Varghese, G.: New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice. ACM Trans. Comput. Syst. 21(3), 270–313 (2003)
Flajolet, P., Martin, G.N.: Probabilistic counting. In: FOCS 1983, pp. 76–82 (1983)
Gibbons, P.B., Tirthapura, S.: Estimating simple functions on the union of data streams. In: SPAA 2001, pp. 281–291 (2001)
Golab, L., DeHaan, D., Demaine, E.D., López-Ortiz, A., Munro, J.I.: Identifying frequent items in sliding windows over on-line packet streams. In: IMC 2003, pp. 173–178 (2003)
Hung, R.Y.S., Ting, H.F.: Finding heavy hitters over the sliding window of a weighted data stream. In: Laber, E.S., Bornstein, C., Nogueira, L.T., Faria, L. (eds.) LATIN 2008. LNCS, vol. 4957, pp. 699–710. Springer, Heidelberg (2008)
Hung, R.Y., Lee, L.K., Ting, H.: Finding frequent items over sliding windows with constant update time. Information Processing Letters 110(7), 257–260 (2010)
Indyk, P.: A small approximately min-wise independent family of hash functions. In: SODA 1999, pp. 454–456 (1999)
Indyk, P.: Heavy hitters and sparse approximations, lecture notes (2009), http://people.csail.mit.edu/indyk/Rice/lec4.pdf
Indyk, P., Woodruff, D.: Optimal approximations of the frequency moments of data streams. In: STOC 2005, pp. 202–208 (2005)
Jin, C., Qian, W., Sha, C., Yu, J.X., Zhou, A.: Dynamically maintaining frequent items over a data stream. In: CIKM 2003, pp. 287–294 (2003)
Kane, D.M., Nelson, J., Woodruff, D.P.: An optimal algorithm for the distinct elements problem. In: PODS 2010, pp. 41–52 (2010)
Karp, R.M., Shenker, S., Papadimitriou, C.H.: A simple algorithm for finding frequent elements in streams and bags. ACM Trans. Database Syst. 28, 51–55 (2003)
Lee, L.K., Ting, H.F.: A simpler and more efficient deterministic scheme for finding frequent items over sliding windows. In: PODS 2006, pp. 290–297 (2006)
Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: VLDB 2002, pp. 346–357 (2002)
Open problems in data streams and related topics. IITK Workshop on Algrithms for Data Streams 2006 (2006), compiled and edited by McGregor, A.
Metwally, A., Agrawal, D., El Abbadi, A.: Efficient computation of frequent and top-k elements in data streams. In: Eiter, T., Libkin, L. (eds.) ICDT 2005. LNCS, vol. 3363, pp. 398–412. Springer, Heidelberg (2005)
Muthukrishnan, S.: Data streams: Algorithms and applications. Now Publishers Inc. (2005)
Nie, G., Lu, Z.: Approximate frequency counts in sliding window over data stream. In: Canadian Conference on Electrical and Computer Engineering, pp. 2232–2236 (2005)
Saks, M., Sun, X.: Space lower bounds for distance approximation in the data stream model. In: STOC 2002, pp. 360–369 (2002)
Sen, S., Wang, J.: Analyzing peer-to-peer traffic across large networks. In: IMW 2002, pp. 137–150. ACM (2002)
Tirthapura, S., Woodruff, D.P.: A general method for estimating correlated aggregates over a data stream. In: International Conference on Data Engineering, pp. 162–173 (2012)
Zhang, L., Guan, Y.: Frequency estimation over sliding windows. In: International Conference on Data Engineering, pp. 1385–1387 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Braverman, V., Gelles, R., Ostrovsky, R. (2013). How to Catch L 2-Heavy-Hitters on Sliding Windows. In: Du, DZ., Zhang, G. (eds) Computing and Combinatorics. COCOON 2013. Lecture Notes in Computer Science, vol 7936. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38768-5_56
Download citation
DOI: https://doi.org/10.1007/978-3-642-38768-5_56
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38767-8
Online ISBN: 978-3-642-38768-5
eBook Packages: Computer ScienceComputer Science (R0)