Skip to main content

Database Compression Using an Offline Dictionary Method

  • Conference paper
  • First Online:
Advances in Information Systems (ADVIS 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2457))

Included in the following conference series:

Abstract

Off-line dictionary compression is becoming more attractive for applications where compressed data are searched directly in compressed form. While there has been large body of related work describing specific database compression algorithms, the Hibase [10] architecture is unique in processing queries in compressed data. However, this technique does not compress the representation of strings in the domain dictionaries. Primary keys, data with high cardinality and semi-structured data contribute very little or no compression. To achieve high performance irrespective of type of data, the string representation must be in compressed form. At the same time, the direct addressability of compressed data is maintained. Serial compression techniques cannot be used. In this paper, we present a prefix dictionary-based off-line method that can be incorporated with systems like Hibase where compressed data can be accessed directly without prior decompression. The complexity is O(n) in time and space.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Huffman D. A. A method for the construction of minimum-redundancy code. Proc. IRE, 40:9:1098–1101, 1952.

    Google Scholar 

  2. W.P. Cockshott, D. McGregor, and Wilson J. Kotsis, N. Data compression in database systems. IDEAS’98, Cardi., July, 1998.

    Google Scholar 

  3. G. P. Copeland and S. Khoshafian. A decomposition storage model. In Proceedings of the 1985 ACM SIGMOD, Austin, Texas, May 1985.

    Google Scholar 

  4. G.V. Cormack. Data compression on a database system. Communication of the ACM, 28:12, 1985.

    Google Scholar 

  5. J. Goldstein, R. Ramakrishnan, and U. Shaft. Compressing relations and indexes. Int Proce. IEEE Conf on Data Engineering, 1998.

    Google Scholar 

  6. McGregor D. R. Hoque A. S. M. L. Storage and querying high dimensional sparsely populated data in compressed representation. In Accepted in EurAsia-ICT, Tehran, Iran, October 2002.

    Google Scholar 

  7. Bentley J. and Mcllroy D. Data compression using long common strings. http://www.bell-labs.com .

  8. Larson N. J. and Moffat A. Off-line dictionary-based compression. Proceedings of the IEEE, 88:11, 2000.

    Google Scholar 

  9. M Neumuller. Compact data structure for querying xml. Proceedings of EDTB, 2002.

    Google Scholar 

  10. Cockshott W. P., MCGregor D., and Wilson J. High-performance operations using a compressed architecture. The Computer Journal, 41:5:283–296, 1998.

    Article  MATH  Google Scholar 

  11. F. Rubin. Experiments in text compression. Commun. ACM, 19, November 1976.

    Google Scholar 

  12. D. Solomon. Data Compression The Complete Reference. Springer-Verlag New York, Inc., 1998.

    Google Scholar 

  13. J.A. Storer and T.G. Szymanski. Data compression via textual substitution. Journal of the ACM, 29:928–951, 1982.

    Article  MATH  MathSciNet  Google Scholar 

  14. Weltch T.A. A technique for high-performance data compression. Computer, 17:6:8–19, 1984.

    Article  Google Scholar 

  15. T. Westmann, D. Kossmann, S. Helmer, and G. Moerkotte. The implementation and performance of compressed databases. SIGMOD Record, 29:3:55–67, 2000.

    Article  Google Scholar 

  16. J. Ziv and A. Lempel. Compression of individual sequences via variable rate coding. IEEE Transaction on Information Theory, IT-23(3):337–343, 1978.

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hoque, A.S.M.L., McGregor, D., Wilson, J. (2002). Database Compression Using an Offline Dictionary Method. In: Yakhno, T. (eds) Advances in Information Systems. ADVIS 2002. Lecture Notes in Computer Science, vol 2457. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36077-8_2

Download citation

  • DOI: https://doi.org/10.1007/3-540-36077-8_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-00009-9

  • Online ISBN: 978-3-540-36077-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics