An Online Algorithm for Finding the Longest Previous Factors

Okanohara, Daisuke; Sadakane, Kunihiko

doi:10.1007/978-3-540-87744-8_58

Daisuke Okanohara¹ &
Kunihiko Sadakane²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5193))

Included in the following conference series:

European Symposium on Algorithms

1893 Accesses
13 Citations

Abstract

We present a novel algorithm for finding the longest factors in a text, for which the working space is proportional to the history text size. Moreover, our algorithm is online and exact; in that, unlike the previous batch algorithms [4, 5, 6, 7, 14], which needs to read the entire input beforehand, our algorithm reports the longest match just after reading each character. This algorithm can be directly used for data compression, pattern analysis, and data mining. Our algorithm also supports the window buffer, in that we can bound the working space by discarding the history from the oldest character. Using the dynamic rank/select dictionary [17], our algorithm requires n logσ + O(n logσ) + O(n) bits of working space, and O(log³ n) time per character, O(n log³ n) total time, n is the length of the history, and σ is the alphabet size. We implemented our algorithm and compared it with the recent algorithms [4, 5, 14] in terms of speed and the working space. We found that our algorithm can work with a smaller working space, less than 1/2 of those for the previous methods in real-world data, and with a reasonable decline in speed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 189.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. Journal of Discrete Algorithms 2(1), 53–86 (2004)
Article MATH MathSciNet Google Scholar
Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation (1994)
Google Scholar
Chan, H., Hon, W.K., Lam, T.W., Sadakane, K.: Compressed indexes for dynamic text collections. ACM Transactions on Algorithms 3(2), 21 (2007)
Article MathSciNet Google Scholar
Chen, G., Puglisi, S.J., Smyth, W.F.: LZ factorization in less time and space. Mathematics in Computer Science (MCS) Special Issue on Combinatorial Algorithms (2008)
Google Scholar
Chen, G., Puglisi, S.J., Smyth, W.: Fast and practical algorithms for computing all the runs in a string. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 307–315. Springer, Heidelberg (2007)
Chapter Google Scholar
Crochemore, M., Ilie, L.: LZ factorization in less time and space. Information Processing Letters 106, 75–80 (2008)
MathSciNet Google Scholar
Crochemore, M., Ilie, L., Smyth, W.F.: A simple algorithm for computing the Lempel–Ziv factorization. In: DCC, pp. 482–488 (2008)
Google Scholar
Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proc. of FOCS (2000)
Google Scholar
Fischer, J., Heun, V.: Theoretical and practical improvements on the RMQ-problem, with applications to LCA and LCE. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 36–48. Springer, Heidelberg (2006)
Chapter Google Scholar
Fischer, J., Heun, V.: A new succinct representation of rmq-information and improvements in the enhanced suffix array. In: Chen, B., Paterson, M., Zhang, G. (eds.) ESCAPE 2007. LNCS, vol. 4614. Springer, Heidelberg (2007)
Chapter Google Scholar
Franek, F., Simpson, R.J., Smyth, W.F.: The maximum number of runs in a string. In: AWOCA, pp. 26–35 (2003)
Google Scholar
Gonnet, G.H., Baeza-Yates, R., Snider, T.: New indices for text: PAT trees and PAT arrays. Information Retrieval: Algorithms and Data Structures, 66–82 (1992)
Google Scholar
Kasai, T., Lee, G., Arimura, H., Arikawa, S., Park, K.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 181–192. Springer, Heidelberg (2001)
Google Scholar
Kolpakov, R., Kucherov, G.: Mreps, http://bioinfo.lifl.fr/mreps/
Larsson, J.: Extended application of suffix trees to data compression. In: Proc. of DCC, pp. 190–199 (1996)
Google Scholar
Larsson, J.: Structures of String Matching and Data Compression. PhD thesis, Lund University (1999)
Google Scholar
Lee, S., Park, K.: Dynamic rank-select structures with applications to run-length encoded texts. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 95–106. Springer, Heidelberg (2007)
Chapter Google Scholar
Lippert, R., Mobarry, C., Walenz, B.: A space-efficient construction of the burrows wheeler transform for genomic data. Journal of Computational Biology (2005)
Google Scholar
Manber, U., Myers, E.W.: Suffix arrays: A new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)
Article MATH MathSciNet Google Scholar
Moffat, A.: An improved data structure for cumulative probability tables. Software: Practice and Experience 29, 647–659 (1999)
Article Google Scholar
Mori, Y.: libdivsufsort, http://code.google.com/p/libdivsufsort/
Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Computing Surveys 39(1) (2007)
Google Scholar
Sadakane, K.: Succinct representations of LCP information and improvements in the compressed suffi arrays. In: ACM-SIAM SODA, pp. 225–232 (2002)
Google Scholar
Sadakane, K.: Compressed suffix trees with full functionality. J. Theory of Computing Systems (2007)
Google Scholar
Smyth, W.F.: http://www.cas.mcmaster.ca/~bill/strbings/
Weiner, P.: Linear pattern matching algorihms. In: Proceedings of the 14th IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973)
Google Scholar
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23(3), 337–343 (1977)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo, 113-0013, Japan
Daisuke Okanohara
Department of Computer Science and Communication Engineering, Kyushu University, Motooka 744, Nishi-ku, Fukuoka, 819-0395, Japan
Kunihiko Sadakane

Authors

Daisuke Okanohara
View author publications
You can also search for this author in PubMed Google Scholar
Kunihiko Sadakane
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Dan Halperin Kurt Mehlhorn

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Okanohara, D., Sadakane, K. (2008). An Online Algorithm for Finding the Longest Previous Factors. In: Halperin, D., Mehlhorn, K. (eds) Algorithms - ESA 2008. ESA 2008. Lecture Notes in Computer Science, vol 5193. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87744-8_58

Download citation

DOI: https://doi.org/10.1007/978-3-540-87744-8_58
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87743-1
Online ISBN: 978-3-540-87744-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics