skip to main content
10.1145/276304.276342acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article
Free Access

Approximate medians and other quantiles in one pass and with limited memory

Published:01 June 1998Publication History

ABSTRACT

We present new algorithms for computing approximate quantiles of large datasets in a single pass. The approximation guarantees are explicit, and apply for arbitrary value distributions and arrival distributions of the dataset. The main memory requirements are smaller than those reported earlier by an order of magnitude.

We also discuss methods that couple the approximation algorithms with random sampling to further reduce memory requirements. With sampling, the approximation guarantees are explicit but probabilistic, i.e. they apply with respect to a (user controlled) confidence parameter.

We present the algorithms, their theoretical analysis and simulation results on different datasets.

References

  1. 1.P. G. Selinger, M. M. Astrahan, R. A. Lories, and T. G. Price, "Access Path Selection in a Relational Database Management System", in A CM SIGMOD 79, June 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.G. Piatetsky-Shapiro, "Accurate Estimation of the Number of Tuples Satisfying a Condition", in A CM SIGMOD 8~, Boston, June 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.V. Poosala, Y. E. Ioannidis, P. J. Haas, and E. J. Shekita, "Improved Histograms for Selectivity Estimation of Range Predicates", in ACM SIGMOD 96, pp. 294-305, Montreal, June 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4."DB2 MVS:", To be completed.Google ScholarGoogle Scholar
  5. 5."Informix", To be completed.Google ScholarGoogle Scholar
  6. 6.D. DeWitt, J. Naughton, and D. Schneider, "Parallel Sorting on a Shared-Nothing Architecture using Probabilistic Splitting", in Proc. Intl. Conf. on Parallel and Distributed Inf. Sys., pp. 280-291, Miami Beach, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7.M. Blum, R. W. Floyd, V. R. Pratt, R. L. Rivest, and R. E. Tarjan, "Time Bounds for Selection", in J. Cornput. Syst. Sci., vol. 7, pp. 448-461, 1973.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.M. R. Paterson, "Progress in Selection", Deptt. of Computer Science, University of Warwick, Coventry, UK, 1997.Google ScholarGoogle Scholar
  9. 9.D. Dor, Selection Algorithms, PhD thesis, Tel-Aviv University, 1995.Google ScholarGoogle Scholar
  10. 10.D. Dor and U. Zwick, "Selecting the Median", in Proc. 6th Annual A CM-SIAM Symp. on Discrete Algorithms, pp. 28-37, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.D. Dot and U. Zwick, "Finding the anth Largest Element", Combinatorica, vol. 16, pp. 41-58, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  12. 12.D. Dor and U. Zwick, "Median Selection Requires (2 q-e)n Comparisons", Technical Report 312/96, Department of Computer Science, Tel-Aviv University, Apr. 1996.Google ScholarGoogle Scholar
  13. 13.F. F. Yao, "On Lower Bounds for Selection Problems", Technical Report MAC TR-121, Massachusetts Institute of Technology, 1974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14.I. Pohl, "A Minimum Storage Algorithm for Computing the Median", Technical Report IBM Research Report RC 2701 (# 12713), IBM T J Watson Center, Nov. 1969.Google ScholarGoogle Scholar
  15. 15.J. I. Munro and M. S. Paterson, "Selection and Sorting with Limited Storage", Theoretical Computer Science, vol. 12, pp. 315-323, 1980.Google ScholarGoogle ScholarCross RefCross Ref
  16. 16.R. Jain and I. Chlamtac, "The p2 Algorithm for Dynamic Calculation for Quantiles and Histograms without Storing Observations", CA CM, vol. 28, pp. 1076- 1085, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17.R. Agrawal and A. Swami, "A One-Pass Space-Efficient Algorithm for Finding Quantiles", in Proc. 7th Intl. Conf. Management of Data (COMAD-95), Pune, India, 1995.Google ScholarGoogle Scholar
  18. 18.K. Alsabti, S. Ranka, and V. Singh, "A One-Pass Algorithm for Accurately Estimating Quantiles for Disk- Resident Data", in Proc. $3rd VLDB Conference, Athens, Greece, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19.W. Hoeffding, "Probability Inequalities for Sum8 of Bounded Random Variables", American Statistical Association Jornal, pp. 13-30, Mar. 1963.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Approximate medians and other quantiles in one pass and with limited memory

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD '98: Proceedings of the 1998 ACM SIGMOD international conference on Management of data
        June 1998
        599 pages
        ISBN:0897919955
        DOI:10.1145/276304

        Copyright © 1998 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 June 1998

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate785of4,003submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader