Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8066))

Abstract

This paper is a survey on the problem of storing a string in compressed format, so that (a) the resulting space is close to the high-order empirical entropy of the string, which is a lower bound on the compression achievable with text compressors based on contexts, and (b) constant-time access is still provided to the string as if it was uncompressed. This is obviously better than decompressing (a large portion of) the whole string each time a random access to one of its substrings is needed. A storage scheme that satisfies these requirements can thus replace the trivial explicit representation of a text in any data structure that requires random access to it, alleviating the algorithmic designer from the task of compressing it.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barbay, J., He, M., Munro, J.I., Satti, S.R.: Succinct indexes for strings, binary relations and multilabeled trees. ACM Transactions on Algorithms 7(4), 52 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  2. Bille, P., Landau, G.M., Raman, R., Sadakane, K., Satti, S.R., Weimann, O.: Random access to grammar-compressed strings. In: SODA, pp. 373–389 (2011)

    Google Scholar 

  3. Brisaboa, N.R., Ladra, S., Navarro, G.: Directly addressable variable-length codes. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 122–130. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  4. Brisaboa, N.R., Cánovas, R., Claude, F., Martínez-Prieto, M.A., Navarro, G.: Compressed string dictionaries. In: Pardalos, P.M., Rebennack, S. (eds.) SEA 2011. LNCS, vol. 6630, pp. 136–147. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  5. Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm. Tech. Rep. 124, Digital Equipment Corporation (1994)

    Google Scholar 

  6. Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Transactions on Information Theory 51(7), 2554–2576 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  7. Demaine, E.D., López-Ortiz, A.: A linear lower bound on index size for text retrieval. J. Algorithms 48(1), 2–15 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  8. Dodis, Y., Patrascu, M., Thorup, M.: Changing base without losing space. In: STOC, pp. 593–602 (2010)

    Google Scholar 

  9. Ferragina, P., Venturini, R.: A simple storage scheme for strings achieving entropy bounds. Theor. Comput. Sci. 372(1), 115–121 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  10. Fraenkel, A.S., Kleinb, S.T.: Robust universal complete codes for transmission and compression. Discrete Applied Mathematics 64(1), 31–55 (1996)

    Article  Google Scholar 

  11. Fredman, M.L., Saks, M.E.: The cell probe complexity of dynamic data structures. In: STOC, pp. 345–354 (1989)

    Google Scholar 

  12. Fredriksson, K., Nikitin, F.: Simple random access compression. Fundam. Inform. 92(1-2), 63–81 (2009)

    MathSciNet  MATH  Google Scholar 

  13. Gál, A., Miltersen, P.B.: The cell probe complexity of succinct data structures. Theor. Comput. Sci. 379, 405–417 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  14. Golynski, A.: Optimal lower bounds for rank and select indexes. Theor. Comput. Sci. 387, 348–359 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  15. González, R., Navarro, G.: Statistical encoding of succinct data structures. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 294–305. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  16. Grossi, R.: A quick tour on suffix arrays and compressed suffix arrays. Theor. Comput. Sci. 412(27), 2964–2973 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  17. Grossi, R., Raman, R., Rao, S.S., Venturini, R.: Dynamic compressed strings with random access. In: Fomin, F.V., Freivalds, R., Kwiatkowska, M., Peleg, D. (eds.) ICALP 2013, Part I. LNCS, vol. 7965, pp. 504–515. Springer, Heidelberg (2013)

    Google Scholar 

  18. Hon, W.-K., Shah, R., Vitter, J.S.: Compression, indexing, and retrieval for massive string data. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 260–274. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  19. Jansson, J., Sadakane, K., Sung, W.K.: Cram: Compressed random access memory. In: Czumaj, A., Mehlhorn, K., Pitts, A., Wattenhofer, R. (eds.) ICALP 2012, Part I. LNCS, vol. 7391, pp. 510–521. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  20. Kosaraju, R., Manzini, G.: Compression of low entropy strings with Lempel-Ziv algorithms. SIAM Journal of Computing 29(3), 893–911 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  21. Larsson, N.J., Moffat, A.: Offline dictionary-based compression. In: Data Compression Conference, pp. 296–305 (1999)

    Google Scholar 

  22. Lohrey, M.: Algorithmics on slp-compressed strings: A survey. Groups Complexity Cryptology 4(2), 241–299 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  23. Manber, U.: A text compression scheme that allows fast searching directly in the compressed file. ACM Trans. Inf. Syst. 15(2), 124–136 (1997)

    Article  Google Scholar 

  24. Manzini, G.: An analysis of the Burrows-Wheeler transform. Journal of the ACM 48(3), 407–430 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  25. Munro, J.I.: Tables. In: Chandru, V., Vinay, V. (eds.) FSTTCS 1996. LNCS, vol. 1180, pp. 37–42. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  26. Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comput. Surv. 39(1) (2007)

    Google Scholar 

  27. Ottaviano, G., Grossi, R.: Fast compressed tries through path decompositions. In: ALENEX, pp. 65–74 (2012)

    Google Scholar 

  28. Patrascu, M., Viola, E.: Cell-probe lower bounds for succinct partial sums. In: Charikar, M. (ed.) SODA, pp. 117–122. SIAM (2010)

    Google Scholar 

  29. Raman, R., Raman, V., Satti, S.R.: Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Transactions on Algorithms 3(4) (2007)

    Google Scholar 

  30. Raman, R., Rao, S.S.: Succinct representations of ordinal trees. In: Brodnik, A., López-Ortiz, A., Raman, V., Viola, A. (eds.) Munro Festschrift 2013. LNCS, vol. 8066, pp. 319–332. Springer, Heidelberg (2013)

    Google Scholar 

  31. Rytter, W.: Application of lempel-ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci. 302(1-3), 211–222 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  32. Sadakane, K.: Personal communication (2012)

    Google Scholar 

  33. Sadakane, K., Grossi, R.: Squeezing succinct data structures into entropy bounds. In: Proc. of the 17th ACM-SIAM SODA, pp. 1230–1239 (2006)

    Google Scholar 

  34. Sadakane, K., Grossi, R.: Squeezing succinct data structures into entropy bounds. In: SODA, pp. 1230–1239. ACM Press (2006)

    Google Scholar 

  35. Verbin, E., Yu, W.: Data structure lower bounds on random access to grammar-compressed strings. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 247–258. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  36. Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann Publishers (1999)

    Google Scholar 

  37. Witten, I.H., Neal, R.M., Cleary, J.G.: Arithmetic coding for data compression. Commun. ACM 30(6), 520–540 (1987)

    Article  Google Scholar 

  38. Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory 24(5), 530–536 (1978)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Grossi, R. (2013). Random Access to High-Order Entropy Compressed Text. In: Brodnik, A., López-Ortiz, A., Raman, V., Viola, A. (eds) Space-Efficient Data Structures, Streams, and Algorithms. Lecture Notes in Computer Science, vol 8066. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40273-9_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40273-9_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40272-2

  • Online ISBN: 978-3-642-40273-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics