Suffix Trays and Suffix Trists: Structures for Faster Text Indexing

Cole, Richard; Kopelowitz, Tsvi; Lewenstein, Moshe

doi:10.1007/s00453-013-9860-6

Suffix Trays and Suffix Trists: Structures for Faster Text Indexing

Published: 18 January 2014

Volume 72, pages 450–466, (2015)
Cite this article

Algorithmica Aims and scope Submit manuscript

Richard Cole¹,
Tsvi Kopelowitz² &
Moshe Lewenstein²

356 Accesses
5 Citations
Explore all metrics

Abstract

Suffix trees and suffix arrays are two of the most widely used data structures for text indexing. Each uses linear space and can be constructed in linear time for polynomially sized alphabets. However, when it comes to answering queries with worst-case deterministic time bounds, the prior does so in O(mlog|Σ|) time, where m is the query size, |Σ| is the alphabet size, and the latter does so in O(m+logn) time, where n is the text size. If one wants to output all appearances of the query, an additive cost of O(occ) time is sufficient, where occ is the size of the output. Notice that it is possible to obtain a worst case, deterministic query time of O(m) but at the cost of super-linear construction time or space usage.

We propose a novel way of combining the two into, what we call, a suffix tray. The space and construction time remain linear and the query time improves to O(m+log|Σ|) for integer alphabets from a linear range, i.e. Σ⊂{1,…,cn}, for an arbitrary constant c. The construction and query are deterministic. Here also an additive O(occ) time is sufficient if one desires to output all appearances of the query.

We also consider the online version of indexing, where the text arrives online, one character at a time, and indexing queries are answered in tandem. In this variant we create a cross between a suffix tree and a suffix list (a dynamic variant of suffix array) to be called a suffix trist; it supports queries in O(m+log|Σ|) time. The suffix trist also uses linear space. Furthermore, if there exists an online construction for a linear-space suffix tree such that the cost of adding a character is worst-case deterministic f(n,|Σ|) (n is the size of the current text), then one can further update the suffix trist in O(f(n,|Σ|)+log|Σ|) time. The best currently known worst-case deterministic bound for f(n,|Σ|) is O(logn) time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Most Recent Match Queries in On-Line Suffix Trees

Sparse Suffix Tree Construction in Small Space

Fast Compressed Self-indexes with Deterministic Linear-Time Construction

Article 22 October 2019

Notes

Note that the special $ character is a special delimiter which has only appears at the end of the text and is considered to be lexicographically larger than all of the other integers in Σ.
This can easily be done with the RMQ data structure. However, the original result of Manber and Myers [15] is slightly more rigid. The more expansive view is described in [14].

References

Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. Discrete Algorithms 2(1), 53–86 (2004)
Article MATH MathSciNet Google Scholar
Aho, A.V., Hopcroft, J.E., Ullman, J.D.: The Design and Analysis of Computer Algorithms. Addison-Wesley, New York (1974)
MATH Google Scholar
Amir, A., Kopelowitz, T., Lewenstein, M., Lewenstein, N.: Towards real-time suffix tree construction. In: Proc. of Symp. on String Processing and Information Retrieval (SPIRE), pp. 67–78 (2005)
Chapter Google Scholar
Breslauer, D., Italiano, G.F.: Near real-time suffix tree construction via the fringe marked ancestor problem. J. Discrete Algorithms 18, 32–48 (2013)
Article MATH MathSciNet Google Scholar
Cole, R., Lewenstein, M.: Multidimensional matching and fast search in suffix trees. In: Proc. of the Symposium on Discrete Algorithms (SODA), pp. 851–852 (2003)
Google Scholar
Dietz, P.F., Sleator, D.D.: Two algorithms for maintaining order in a list. In: Proc. of Symposium on Theory of Computing (STOC), pp. 365–372 (1987)
Google Scholar
Farach-Colton, M., Ferragina, P., Muthukrishnan, S.: On the sorting-complexity of suffix tree construction. J. ACM 47(6), 987–1011 (2000)
Article MATH MathSciNet Google Scholar
Franceschini, G., Grossi, R.: A general technique for managing strings in comparison-driven data structures. In: Proc. 31st Intl. Col. on Automata, Languages and Programming (ICALP). LNCS, vol. 3142, pp. 606–617 (2004)
Chapter Google Scholar
Grossi, R., Italiano, G.F.: Efficient techniques for maintaining multidimensional keys in linked data structures. In: Proc. of the Intl. Col. on Automata, Languages and Programming (ICALP), pp. 372–381 (1999)
Chapter Google Scholar
Kärkkäinen, J., Sanders, P., Burkhardt, S.: Linear work suffix array construction. J. ACM 53(6), 918–936 (2006)
Article MathSciNet Google Scholar
Kim, D.K., Sim, J.S., Park, H., Park, K.: Constructing suffix arrays in linear time. J. Discrete Algorithms 3(2–4), 126–142 (2005)
Article MATH MathSciNet Google Scholar
Ko, P., Aluru, S.: Space efficient linear time construction of suffix arrays. J. Discrete Algorithms 3(2–4), 143–156 (2005)
Article MATH MathSciNet Google Scholar
Kopelowitz, T.: On-line indexing for general alphabets via predecessor queries on subsets of an ordered list. In: Proc. of the Symposium on Foundations of Computer Science (FOCS), pp. 283–292 (2012)
Google Scholar
Lewenstein, M.: Orthogonal range searching for text indexing. In: Space-Efficient Data Structures, Streams, and Algorithms. LNCS, vol. 8066, pp. 267–302 (2013)
Chapter Google Scholar
Manber, U., Myers, E.W.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)
Article MATH MathSciNet Google Scholar
McCreight, E.M.: A space-economical suffix tree construction algorithm. J. ACM 23, 262–272 (1976)
Article MATH MathSciNet Google Scholar
Mehlhorn, K.: Data Structures and Algorithms 1: Sorting and Searching. EATCS Monographs in Theoretical Computer Science. Spriger, Berlin (1984)
Book MATH Google Scholar
Ruzic, M.: Constructing efficient dictionaries in close to sorting time. In: Proc. of the Intl. Col. on Automata, Languages and Programming (ICALP), vol. 1, pp. 84–95 (2008)
Chapter Google Scholar
Tarjan, R.E.: Data Structures and Network Algorithms. CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 44. SIAM, Philadelphia (1983)
Book Google Scholar
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14, 249–260 (1995)
Article MATH MathSciNet Google Scholar
Weiner, P.: Linear pattern matching algorithm. In: Proc. 14th IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Courant Institute, NYU, New York, USA
Richard Cole
Dept. of Computer Science, Bar-Ilan U., 52900, Ramat-Gan, Israel
Tsvi Kopelowitz & Moshe Lewenstein

Authors

Richard Cole
View author publications
You can also search for this author in PubMed Google Scholar
Tsvi Kopelowitz
View author publications
You can also search for this author in PubMed Google Scholar
Moshe Lewenstein
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Moshe Lewenstein.

Additional information

Results from this paper have appeared as an extended abstract in ICALP 2006.

Cole’s work was supported in part by NSF grant CCF-1217989. Lewenstein’s research was supported by a BSF grant (#2010437) and a GIF grant (#1147/2011).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cole, R., Kopelowitz, T. & Lewenstein, M. Suffix Trays and Suffix Trists: Structures for Faster Text Indexing. Algorithmica 72, 450–466 (2015). https://doi.org/10.1007/s00453-013-9860-6

Download citation

Received: 14 November 2012
Accepted: 17 December 2013
Published: 18 January 2014
Issue Date: June 2015
DOI: https://doi.org/10.1007/s00453-013-9860-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Suffix Trays and Suffix Trists: Structures for Faster Text Indexing

Abstract

Access this article

Similar content being viewed by others

Most Recent Match Queries in On-Line Suffix Trees

Sparse Suffix Tree Construction in Small Space

Fast Compressed Self-indexes with Deterministic Linear-Time Construction

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Suffix Trays and Suffix Trists: Structures for Faster Text Indexing

Abstract

Access this article

Similar content being viewed by others

Most Recent Match Queries in On-Line Suffix Trees

Sparse Suffix Tree Construction in Small Space

Fast Compressed Self-indexes with Deterministic Linear-Time Construction

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation