Abstract
A histogram is a piecewise-constant approximation of an observed data distribution. A histogram is used as a small-space, approximate synopsis of the underlying data distribution, which is often too large to be stored precisely. Histograms have found many applications in database management systems, perhaps most commonly for query selectivity estimation in query optimizers [1], but have also found applications in approximate query answering [2], load balancing in parallel join execution [3], mining time-series data [4], partition-based temporal join execution, query pro.ling for user feedback, etc. Ioannidis has a nice overview of the history of histograms, their applications, and their use in commercial DBMSs [5]. Also, Poosala’s thesis provides a systematic treatment of different types of histograms [3].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ioannidis, Y., Christodoulakis, S.: Optimal histograms for limiting worst-case error propagation in the size of join results. ACM Trans. Database Syst. 18, 709–748 (1993)
Acharya, S., Gibbons, P., Poosala, V., Ramaswamy, S.: The aqua approximate query answering system. In: SIGMOD Conference, pp. 574–576 (1999)
Poosala, V.: Histogram-based estimation techniques in database systems. PhD thesis, Univ. of Wisconsin (1997)
Keogh, E., Chakrabarti, K., Mehrotra, S., Pazzani, M.: Locally adaptive dimensionality reduction for indexing large time series databases. In: Proc. SIGMOD (2001)
Ioannidis, Y.: The history of histograms (abridged). In: Proc. VLDB (2003)
Ioannidis, Y., Poosala, V.: Balancing histogram optimality and practicality for query result size estimation. In: Proc. SIGMOD, pp. 233–244 (1995)
Jagadish, H.V., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K., Suel, T.: Optimal histograms with quality guarantees. In: Proc. VLDB, pp. 275–286 (1998)
Muthukrishnan, S.: Data stream algorithms and applications (2003), http://www.cs.rutgers.edu/~muthu/stream-1-1.ps
Guha, S., Koudas, N., Shim, K.: Data-streams and histograms. In: Proc. ACM STOC, pp. 471–475 (2001)
Guha, S., Koudas, N.: Approximating a data stream for querying and estimation: Algorithms and performance evaluation. In: Proc. ICDE (2002)
Guha, S., Indyk, P., Muthukrishnan, S., Strauss, M.: Histogramming data streams with fast per-item processing. In: Proc 29th ICALP, pp. 681–692 (2002)
Gilbert, A., Guha, S., Indyk, P., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Fast, small-space algorithms for approximate histogram maintenance. In: Proc. ACM STOC, pp. 389–398 (2002)
Chen, C., Roussopoulos, N.: Adaptive selectivity estimation using query feedback. In: Proc. ACM SIGMOD (1994)
Konig, A., Weikum, G.: Combining histograms and parametric curve fitting for feedback driven query result size estimation. In: Proc. VLDB (1999)
Aboulnaga, A., Chaudhuri, S.: Self-tuning histograms: Building histograms without looking at data. In: Proc. ACM SIGMOD (1999)
Qiao, L., Agrawal, D., Abbadi, A.E.: Rhist: adaptive summarization over continuous data streams. In: Proc. CIKM, pp. 469–476 (2002)
Ganti, V., Lee, M., Ramakrishnan, R.: Icicles–self-tuning samples for approximate query answering. In: Proc. VLDB (2000)
Stillger, M., Lohman, G., Markl, V., Kandil, M.: Leo - db2’s learning optimizer. In: Proc. VLDB, pp. 19–28 (2001)
Muthukrishnan, S.: Nonuniform sparse approximation theory with Haar wavelets. Technical report, DIMACS (2004)
Guha, S.: A note on wavelet optimization (2004), http://www.cis.upenn.edu/~sudipto/notes/wavelet.pdf.gz
Matias, Y., Urieli, D.: Optimal workload-based wavelet synopses, Technical report, TAU (2004)
Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory 24, 530–536 (1978)
Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proc. IEEE FOCS, pp. 390–398 (2000)
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23, 337–343 (1977)
Muthukrishnan, S., Strauss, M., Zheng, X.: Workload-optimal histograms on streams. Technical report, DIMACS (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Muthukrishnan, S., Strauss, M., Zheng, X. (2005). Workload-Optimal Histograms on Streams. In: Brodal, G.S., Leonardi, S. (eds) Algorithms – ESA 2005. ESA 2005. Lecture Notes in Computer Science, vol 3669. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11561071_65
Download citation
DOI: https://doi.org/10.1007/11561071_65
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29118-3
Online ISBN: 978-3-540-31951-1
eBook Packages: Computer ScienceComputer Science (R0)