Skip to main content

Dictionary-Based Data Compression

  • Reference work entry
  • First Online:
Encyclopedia of Algorithms
  • 171 Accesses

Years and Authors of Summarized Original Work

  • 1977; Ziv, Lempel

Problem Definition

The problem of lossless data compression is the problem of compactly representing data in a format that admits the faithful recovery of the original information. Lossless data compression is achieved by taking advantage of the redundancy which is often present in the data generated by either humans or machines.

Dictionary-based data compression has been “the solution” to the problem of lossless data compression for nearly 15 years. This technique originated in two theoretical papers of Ziv and Lempel [15, 16] and gained popularity in the “1980s” with the introduction of the Unix tool compress (1986) and of the gif image format (1987). Although today there are alternative solutions to the problem of lossless data compression (e.g., Burrows-Wheeler compression and Prediction by Partial Matching), dictionary-based compression is still widely used in everyday applications: consider for example the zip...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 1,599.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 1,999.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recommended Reading

  1. Arroyuelo D, Navarro G, Sadakane K (2006) Reducing the space requirement of LZ-index. In: Proceedings of 17th combinatorial pattern matching conference (CPM). LNCS, vol 4009. Springer, pp 318–329

    Google Scholar 

  2. Charikar M, Lehman E, Liu D, Panigraphy R, Prabhakaran M, Sahai A, Shelat A (2005) The smallest grammar problem. IEEE Trans Inf Theory 51:2554–2576

    Article  MathSciNet  MATH  Google Scholar 

  3. Cormode G, Muthukrishnan S (2005) Substring compression problems. In: Proceedings of. 16th ACM-SIAM symposium on discrete algorithms (SODA ’05), pp 321–330

    Google Scholar 

  4. Crochemore M, Landau G, Ziv-Ukelson M (2003) A subquadratic sequence alignment algorithm for unrestricted scoring matrices. SIAM J Comput 32:1654–1673

    Article  MathSciNet  MATH  Google Scholar 

  5. Ferragina P, Manzini G (2005) Indexing compressed text. J ACM 52:552–581

    Article  MathSciNet  MATH  Google Scholar 

  6. Kosaraju R, Manzini G (1999) Compression of low entropy strings with Lempel–Ziv algorithms. SIAM J Comput 29:893–911

    Article  MathSciNet  MATH  Google Scholar 

  7. Krishnan P, Vitter J (1998) Optimal prediction for prefetching in the worst case. SIAM J Comput 27:1617–1636

    Article  MathSciNet  MATH  Google Scholar 

  8. Lifshits Y, Mozes S, Weimann O, Ziv-Ukelson M (2007) Speeding up HMMdecoding and training by exploiting sequence repetitions. Springer, 2007

    Google Scholar 

  9. Matias Y, Sahinalp C (1999) On the optimality of parsing in dynamic dictionary based data compression. In: Proceedings 10th annual ACM-SIAM symposium on discrete algorithms (SODA’99), pp 943–944

    Google Scholar 

  10. Navarro G (2004) Indexing text using the Ziv–Lempel trie. J Discret Algorithm 2:87–114

    Article  MathSciNet  MATH  Google Scholar 

  11. Navarro G, Tarhio J (2005) LZgrep: a Boyer-Moore string matching tool for Ziv–Lempel compressed text. Softw Pract Exp 35:1107–1130

    Article  Google Scholar 

  12. Sahinalp C, Rajpoot N (2003) Dictionary-based data compression: an algorithmic perspective. In: Sayood K (ed) Lossless compression handbook. Academic Press, pp 153–167

    Chapter  Google Scholar 

  13. Salomon D (2007) Data compression: the complete reference, 4th edn. Springer, London

    MATH  Google Scholar 

  14. Savari S (1997) Redundancy of the Lempel–Ziv incremental parsing rule. IEEE Trans Inf Theory 43:9–21

    Article  MathSciNet  MATH  Google Scholar 

  15. Ziv J, Lempel A (1977) A universal algorithm for sequential data compression. IEEE Trans Inf Theory 23:337–343

    Article  MathSciNet  MATH  Google Scholar 

  16. Ziv J, Lempel A (1978) Compression of individual sequences via variable-length coding. IEEE Trans Inf Theory 24:530–536

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media New York

About this entry

Cite this entry

Gagie, T., Manzini, G. (2016). Dictionary-Based Data Compression. In: Kao, MY. (eds) Encyclopedia of Algorithms. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-2864-4_108

Download citation

Publish with us

Policies and ethics