Skip to main content

Workload-Optimal Histograms on Streams

  • Conference paper
Algorithms – ESA 2005 (ESA 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3669))

Included in the following conference series:

Abstract

A histogram is a piecewise-constant approximation of an observed data distribution. A histogram is used as a small-space, approximate synopsis of the underlying data distribution, which is often too large to be stored precisely. Histograms have found many applications in database management systems, perhaps most commonly for query selectivity estimation in query optimizers [1], but have also found applications in approximate query answering [2], load balancing in parallel join execution [3], mining time-series data [4], partition-based temporal join execution, query pro.ling for user feedback, etc. Ioannidis has a nice overview of the history of histograms, their applications, and their use in commercial DBMSs [5]. Also, Poosala’s thesis provides a systematic treatment of different types of histograms [3].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ioannidis, Y., Christodoulakis, S.: Optimal histograms for limiting worst-case error propagation in the size of join results. ACM Trans. Database Syst. 18, 709–748 (1993)

    Article  Google Scholar 

  2. Acharya, S., Gibbons, P., Poosala, V., Ramaswamy, S.: The aqua approximate query answering system. In: SIGMOD Conference, pp. 574–576 (1999)

    Google Scholar 

  3. Poosala, V.: Histogram-based estimation techniques in database systems. PhD thesis, Univ. of Wisconsin (1997)

    Google Scholar 

  4. Keogh, E., Chakrabarti, K., Mehrotra, S., Pazzani, M.: Locally adaptive dimensionality reduction for indexing large time series databases. In: Proc. SIGMOD (2001)

    Google Scholar 

  5. Ioannidis, Y.: The history of histograms (abridged). In: Proc. VLDB (2003)

    Google Scholar 

  6. Ioannidis, Y., Poosala, V.: Balancing histogram optimality and practicality for query result size estimation. In: Proc. SIGMOD, pp. 233–244 (1995)

    Google Scholar 

  7. Jagadish, H.V., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K., Suel, T.: Optimal histograms with quality guarantees. In: Proc. VLDB, pp. 275–286 (1998)

    Google Scholar 

  8. Muthukrishnan, S.: Data stream algorithms and applications (2003), http://www.cs.rutgers.edu/~muthu/stream-1-1.ps

  9. Guha, S., Koudas, N., Shim, K.: Data-streams and histograms. In: Proc. ACM STOC, pp. 471–475 (2001)

    Google Scholar 

  10. Guha, S., Koudas, N.: Approximating a data stream for querying and estimation: Algorithms and performance evaluation. In: Proc. ICDE (2002)

    Google Scholar 

  11. Guha, S., Indyk, P., Muthukrishnan, S., Strauss, M.: Histogramming data streams with fast per-item processing. In: Proc 29th ICALP, pp. 681–692 (2002)

    Google Scholar 

  12. Gilbert, A., Guha, S., Indyk, P., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Fast, small-space algorithms for approximate histogram maintenance. In: Proc. ACM STOC, pp. 389–398 (2002)

    Google Scholar 

  13. Chen, C., Roussopoulos, N.: Adaptive selectivity estimation using query feedback. In: Proc. ACM SIGMOD (1994)

    Google Scholar 

  14. Konig, A., Weikum, G.: Combining histograms and parametric curve fitting for feedback driven query result size estimation. In: Proc. VLDB (1999)

    Google Scholar 

  15. Aboulnaga, A., Chaudhuri, S.: Self-tuning histograms: Building histograms without looking at data. In: Proc. ACM SIGMOD (1999)

    Google Scholar 

  16. Qiao, L., Agrawal, D., Abbadi, A.E.: Rhist: adaptive summarization over continuous data streams. In: Proc. CIKM, pp. 469–476 (2002)

    Google Scholar 

  17. Ganti, V., Lee, M., Ramakrishnan, R.: Icicles–self-tuning samples for approximate query answering. In: Proc. VLDB (2000)

    Google Scholar 

  18. Stillger, M., Lohman, G., Markl, V., Kandil, M.: Leo - db2’s learning optimizer. In: Proc. VLDB, pp. 19–28 (2001)

    Google Scholar 

  19. Muthukrishnan, S.: Nonuniform sparse approximation theory with Haar wavelets. Technical report, DIMACS (2004)

    Google Scholar 

  20. Guha, S.: A note on wavelet optimization (2004), http://www.cis.upenn.edu/~sudipto/notes/wavelet.pdf.gz

  21. Matias, Y., Urieli, D.: Optimal workload-based wavelet synopses, Technical report, TAU (2004)

    Google Scholar 

  22. Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory 24, 530–536 (1978)

    Article  MATH  MathSciNet  Google Scholar 

  23. Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proc. IEEE FOCS, pp. 390–398 (2000)

    Google Scholar 

  24. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23, 337–343 (1977)

    Article  MATH  MathSciNet  Google Scholar 

  25. Muthukrishnan, S., Strauss, M., Zheng, X.: Workload-optimal histograms on streams. Technical report, DIMACS (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Muthukrishnan, S., Strauss, M., Zheng, X. (2005). Workload-Optimal Histograms on Streams. In: Brodal, G.S., Leonardi, S. (eds) Algorithms – ESA 2005. ESA 2005. Lecture Notes in Computer Science, vol 3669. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11561071_65

Download citation

  • DOI: https://doi.org/10.1007/11561071_65

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29118-3

  • Online ISBN: 978-3-540-31951-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics