Skip to main content

An Online Algorithm for Finding the Longest Previous Factors

  • Conference paper
Algorithms - ESA 2008 (ESA 2008)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5193))

Included in the following conference series:

Abstract

We present a novel algorithm for finding the longest factors in a text, for which the working space is proportional to the history text size. Moreover, our algorithm is online and exact; in that, unlike the previous batch algorithms [4, 5, 6, 7, 14], which needs to read the entire input beforehand, our algorithm reports the longest match just after reading each character. This algorithm can be directly used for data compression, pattern analysis, and data mining. Our algorithm also supports the window buffer, in that we can bound the working space by discarding the history from the oldest character. Using the dynamic rank/select dictionary [17], our algorithm requires n logσ + O(n logσ) + O(n) bits of working space, and O(log3 n) time per character, O(n log3 n) total time, n is the length of the history, and σ is the alphabet size. We implemented our algorithm and compared it with the recent algorithms [4, 5, 14] in terms of speed and the working space. We found that our algorithm can work with a smaller working space, less than 1/2 of those for the previous methods in real-world data, and with a reasonable decline in speed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 189.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. Journal of Discrete Algorithms 2(1), 53–86 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  2. Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation (1994)

    Google Scholar 

  3. Chan, H., Hon, W.K., Lam, T.W., Sadakane, K.: Compressed indexes for dynamic text collections. ACM Transactions on Algorithms 3(2), 21 (2007)

    Article  MathSciNet  Google Scholar 

  4. Chen, G., Puglisi, S.J., Smyth, W.F.: LZ factorization in less time and space. Mathematics in Computer Science (MCS) Special Issue on Combinatorial Algorithms (2008)

    Google Scholar 

  5. Chen, G., Puglisi, S.J., Smyth, W.: Fast and practical algorithms for computing all the runs in a string. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 307–315. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  6. Crochemore, M., Ilie, L.: LZ factorization in less time and space. Information Processing Letters 106, 75–80 (2008)

    MathSciNet  Google Scholar 

  7. Crochemore, M., Ilie, L., Smyth, W.F.: A simple algorithm for computing the Lempel–Ziv factorization. In: DCC, pp. 482–488 (2008)

    Google Scholar 

  8. Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proc. of FOCS (2000)

    Google Scholar 

  9. Fischer, J., Heun, V.: Theoretical and practical improvements on the RMQ-problem, with applications to LCA and LCE. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 36–48. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  10. Fischer, J., Heun, V.: A new succinct representation of rmq-information and improvements in the enhanced suffix array. In: Chen, B., Paterson, M., Zhang, G. (eds.) ESCAPE 2007. LNCS, vol. 4614. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  11. Franek, F., Simpson, R.J., Smyth, W.F.: The maximum number of runs in a string. In: AWOCA, pp. 26–35 (2003)

    Google Scholar 

  12. Gonnet, G.H., Baeza-Yates, R., Snider, T.: New indices for text: PAT trees and PAT arrays. Information Retrieval: Algorithms and Data Structures, 66–82 (1992)

    Google Scholar 

  13. Kasai, T., Lee, G., Arimura, H., Arikawa, S., Park, K.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 181–192. Springer, Heidelberg (2001)

    Google Scholar 

  14. Kolpakov, R., Kucherov, G.: Mreps, http://bioinfo.lifl.fr/mreps/

  15. Larsson, J.: Extended application of suffix trees to data compression. In: Proc. of DCC, pp. 190–199 (1996)

    Google Scholar 

  16. Larsson, J.: Structures of String Matching and Data Compression. PhD thesis, Lund University (1999)

    Google Scholar 

  17. Lee, S., Park, K.: Dynamic rank-select structures with applications to run-length encoded texts. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 95–106. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  18. Lippert, R., Mobarry, C., Walenz, B.: A space-efficient construction of the burrows wheeler transform for genomic data. Journal of Computational Biology (2005)

    Google Scholar 

  19. Manber, U., Myers, E.W.: Suffix arrays: A new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  20. Moffat, A.: An improved data structure for cumulative probability tables. Software: Practice and Experience 29, 647–659 (1999)

    Article  Google Scholar 

  21. Mori, Y.: libdivsufsort, http://code.google.com/p/libdivsufsort/

  22. Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Computing Surveys 39(1) (2007)

    Google Scholar 

  23. Sadakane, K.: Succinct representations of LCP information and improvements in the compressed suffi arrays. In: ACM-SIAM SODA, pp. 225–232 (2002)

    Google Scholar 

  24. Sadakane, K.: Compressed suffix trees with full functionality. J. Theory of Computing Systems (2007)

    Google Scholar 

  25. Smyth, W.F.: http://www.cas.mcmaster.ca/~bill/strbings/

  26. Weiner, P.: Linear pattern matching algorihms. In: Proceedings of the 14th IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973)

    Google Scholar 

  27. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23(3), 337–343 (1977)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Dan Halperin Kurt Mehlhorn

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Okanohara, D., Sadakane, K. (2008). An Online Algorithm for Finding the Longest Previous Factors. In: Halperin, D., Mehlhorn, K. (eds) Algorithms - ESA 2008. ESA 2008. Lecture Notes in Computer Science, vol 5193. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87744-8_58

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-87744-8_58

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-87743-1

  • Online ISBN: 978-3-540-87744-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics