Abstract
In this paper we describe a data structure that supports pattern matching queries on a dynamically arriving text over an alphabet of constant size. Each new symbol can be prepended to T in O(1) worst-case time. At any moment, we can report all occurrences of a pattern P in the current text in \(O(|P|+k)\) time, where |P| is the length of P and k is the number of occurrences. This resolves, under assumption of constant size alphabet, a long-standing open problem of existence of a real-time indexing method for string matching (see Amir and Nor in Real-time indexing over fixed finite alphabets, pp. 1086–1095, 2008).
Similar content being viewed by others
Notes
Henceforth, \(\log ^{(3)}n=\log \log \log n\).
For simplicity we assume that \(\log ^{(3)}n\) and \(\log \log n\) are integers and \(\log ^{(3)}n \) divides \(\log \log n\). If this is not the case, we can find \(d'\) and d that satisfy these requirements such that \(\log \log n\le d\le 2\log \log n\) and \(\log ^{(3)}n\le d'\le 2\log ^{(3)}n\).
In fact, the query time is even slightly better.
In fact, it would suffice to store \(3d-1\) most recently read symbols in compact form.
References
Amir, A., Kopelowitz, T., Lewenstein, M., Lewenstein, N.: Towards real-time suffix tree construction. In: Consens, M., Navarro, G. (eds.) Proceedings of International Symposium on String Processing and Information Retrieval (SPIRE), volume 3772 of Lecture Notes in Computer Science, pp. 67–78. Springer, Berlin (2005)
Amir, A. and Nor, I.: Real-time indexing over fixed finite alphabets. In: Proceedings of 19th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2008), pp. 1086–1095 (2008)
Breslauer, D., Grossi, R., Mignosi, F.: Simple real-time constant-space string matching. In: Giancarlo, R., Manzini, G. (eds.) Combinatorial Pattern Matching. Lecture Notes in Computer Science, vol. 6661, pp. 173–183. Springer, Berlin (2011)
Breslauer, D., Italiano, G.F.: Near real-time suffix tree construction via the fringe marked ancestor problem. In: Proceedings of 18th International Symposium on String Processing and Information Retrieval (SPIRE 2011), pp. 156–167 (2011)
Cole, R., Hariharan, R.: Dynamic LCA queries on trees. SIAM J. Comput. 34(4), 894–923 (2005)
Dietz, P.F., Sleator, D.D.: Two algorithms for maintaining order in a list. In: Proceedings of 19th Annual ACM Symposium on Theory of Computing (STOC 1987), pp. 365–372 (1987)
Fischer, J., Gawrychowski, P.: Alphabet-dependent string searching with wexponential search trees. CoRR, abs/1302.3347 (2013)
Fredman, M.L., Willard, D.E.: Trans-dichotomous algorithms for minimum spanning trees and shortest paths. J. Comput. Syst. Sci. 48(3), 533–551 (1994)
Galil, Z.: String matching in real time. J. ACM 28(1), 134–149 (1981)
Giora, Y., Kaplan, H.: Optimal dynamic vertical ray shooting in rectilinear planar subdivisions. ACM Trans. Algorithms (2009). doi:10.1145/1541885.1541889
Kopelowitz, T.: On-line indexing for general alphabets via predecessor queries on subsets of an ordered list. In: Proceedings of 53rd Annual IEEE Symposium on Foundations of Computer Science (FOCS 2012), pp. 283–292 (2012)
Kosaraju, S.R.: Real-time pattern matching and quasi-real-time construction of suffix trees (preliminary version). In: Proceedings of 26th Annual ACM Symposium on Theory of Computing (STOC 1994), pp. 310–316. ACM (1994)
Kucherov, G., Nekrich, Y., Starikovskaya, T.: Cross-document pattern matching. In: Kärkkäinen, J., Stoye, J. (eds) Proceedings of the 23rd Annual Symposium on Combinatorial Pattern Matching (CPM), July 3–5, 2012, Helsinki (Finland), volume 7354 of Lecture Notes in Computer Science, pp. 196–207. Springer (2012)
Mortensen, C.W.: Fully-dynamic two dimensional orthogonal range and line segment intersection reporting in logarithmic time. In: Proceedings of 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2003), pp. 618–627 (2003)
Navarro, G., Nekrich, Y.: Top-k document retrieval in optimal time and linear space. In: Proceedings of 23rd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2012), pp. 1066–1077 (2012)
Slisenko, A.: String-matching in real time: some properties of the data structure. In: Mathematical Foundations of Computer Science 1978, Proceedings, 7th Symposium, Zakopane, Poland, September 4–8, 1978, volume 64 of Lecture Notes in Computer Science, pp. 493–496. Springer (1978)
van Emde Boas, P., Kaas, R., Zijlstra, E.: Design and implementation of an efficient priority queue. Math. Syst. Theory 10, 99–127 (1977)
Willard, D.E.: A density control algorithm for doing insertions and deletions in a sequentially ordered file in good worst-case time. Inf. Comput. 97(2), 150–204 (1992)
Acknowledgments
GK has been supported by the Labex Bézout program funded by the French government. This work was done during the visit of YN to the Laboratoire d’Informatique Gaspard Monge, supported by Université Paris-Est Marne-la-Vallée and CNRS. We thank the anonymous reviewers for helpful comments.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kucherov, G., Nekrich, Y. Full-Fledged Real-Time Indexing for Constant Size Alphabets. Algorithmica 79, 387–400 (2017). https://doi.org/10.1007/s00453-016-0199-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-016-0199-7