skip to main content
article
Free Access

A supplement to sampling-based methods for query size estimation in a database system

Published:01 December 1992Publication History
Skip Abstract Section

Abstract

Sampling-based methods for estimating relation sizes after relational operators such as selections, joins and projections have been intensively studied in recent years. Methods of this type can achieve high estimation accuracy and efficiency. Since the dominating overhead involved in a sampling-based method is the sampling cost, different variants of sampling methods are proposed so as to minimize the sampling percentage (thus reducing the sampling cost) while maintaining the estimation accuracy in terms of the confidence level and relative error (to be precisely defined later in Section 2). In order to determine the minimal sampling percentage, the overall characteristics of the data such as the mean and variance are needed. Currently, the representative sampling-based methods in literature are based on the assumption that overall characteristics of data are unavailable, and thus a significant amount of effort is dedicated to estimating these characteristics so as to approach the optimal (minimal) sampling percentage. The estimation for these characteristics incurs cost as well as suffers the estimation error. In this short essay, we point out that the exact values of these characteristics of data can be kept track of in a database system at a negligible overhead. As a result, the minimal sampling percentage while ensuring the specified relative error and confidence level can be precisely determined.

References

  1. [1] Stavros Christodoulakis. Estimating record selectivities. Information Systems, 8(2): 105-115, 1983.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Stavros Christodoulakis. Estimating block selectivities. In formation Systems, 9(1): 69-79, 1984.Google ScholarGoogle Scholar
  3. [3] Pai-Cheng Chu. A contingency approach to estimating record selectivities. Software Engineering, 17(6): 544-552, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] William G. Cochran. Sampling Techniques. John Wiley & Sons, 1977.Google ScholarGoogle Scholar
  5. [5] Peter J. Haas and Arun N. Swami. Sequential sampling procedures for query sise estimation. In Proceedings of the Very Large Database Conference, pages 341-350, April 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Wen-Chi Hou, G. Ossoyoglu, and E. Dogdu. Error-constrained count query, evaluation in relational databases. In Proceedings of the ACM-SIGMOD Conference, pages 278-287, August 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Wen-Chi Hou and G. Ossoyoglu. Statistical estimators, for aggregate relational algebra queries. ACM Transactions On Database Systems, 16(4): 600-654, December 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Wen-Chi Hou, G. Ozsoyoglu, and Baldeao K. Taneja. Processing aggregate relational queries with hard time constraints. In Proceedings of the ACM-SIGMOD Conference, pages 165-172, August 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Richard Lipton and Jefferey Naughton. Estimating the sise of generalised transitive closures. In Proceedings of the 15th VLDS Conference, pages 165-172, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Richard Lipton and Jefferey Naughton. Query sise estimation by adaptive sampling. In Proceedings of 9th ACM Symposium on Priciples of Database Systems, Pages 40- 46, March 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Richard Lipton, Jeffery Naughton, and Donavan, Schneider. Practical, selectivity estimation through adaptive sampling, In Proceedings of ACM SIGMOD, pages 1-11, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Clifford A. Lynch. Selectivity estimation and query optimisation in large databases with highly skewed distributions of column values. In Proceedings of the 14th VLDS Conference, pages 240-251, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] M. Muralikrishna and D. DeWitt. Statistical profile estimation in database system. Computing Survey, 20(3): 191- 221, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] P. V. Sukhatme and B. V. Sukhatme. Sampling Theory of Surveys with Application. Iowa State University Press, 1970.Google ScholarGoogle Scholar

Index Terms

  1. A supplement to sampling-based methods for query size estimation in a database system

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGMOD Record
        ACM SIGMOD Record  Volume 21, Issue 4
        Dec. 1992
        56 pages
        ISSN:0163-5808
        DOI:10.1145/141818
        • Editor:
        • Arie Segev
        Issue’s Table of Contents

        Copyright © 1992 Authors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 December 1992

        Check for updates

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader