Abstract
In this paper we focus on the problem of compressed pattern matching for the text compression using antidictionaries, which is a new compression scheme proposed recently by Crochemore et al. (1998). We show an algorithm which preprocesses a pattern of length m and an antidictionary M in O(m 2 + ‖M‖) time, and then scans a compressed text of length n in O(n + r) time to find all pattern occurrences, where ‖M‖ is the total length of strings in M and r is the number of the pattern occurrences.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
A.V. Aho and M. Corasick. Efficient string matching: An aid to bibliographic search. Comm. ACM, 18(6):333–340, 1975.
A. Amir and G. Benson. Efficient two-dimensional compressed matching. In Proc. Data Compression Conference’92, page 279, 1992.
A. Amir and G. Benson. Two-dimensional periodicity and its application. In Proc. 3rd Ann. ACM-SIAM Symp. on Discrete Algorithms, pages 440–452, 1992.
A. Amir, G. Benson, and M. Farach. Let sleeping files lie: Pattern matching in Z-compressed files. Journal of Computer and System Sciences, 52:299–307, 1996.
A. Amir, G. Benson, and M. Farach. Optimal two-dimensional compressed matching. Journal of Algorithms, 24(2):354–379, 1997.
A. Amir, G.M. Landau, and U. Vishkin. Efficient pattern matching with scaling. Journal of Algorithms, 13(1):2–32, 1992.
M. Crochemore, F. Mignosi, and A. Restivo. Minimal forbidden words and factor automata. In L. Brim, J. Gruska, and J. Zlatuska, editors, Proc. 23rd Internationial Symp. on Mathematical Foundations of Computer Science, volume 1450 of Lecture Notes in Computer Science, pages 665–673. Springer-Verlag, 1998.
M. Crochemore, F. Mignosi, A. Restivo, and S. Salemi. ext compression using antidictionaries. Technical Report IGM-98-10, Institut Gaspard-Monge, 1998.
E.S. de Moura, G. Navarro, N. Ziviani, and R. Baeza-Yates. Direct pattern matching on compressed text. In Proc. 5th International Symp. on String Processing and Information Retrieval, pages 90–95. IEEE Computer Society, 1998.
E.S. de Moura, G. Navarro, N. Ziviani, and R. Baeza-Yates. Fast sequencial searching on compressed texts allowing errors. In Proc. 21st Ann. International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 298–306. York Press, 1998.
T. Eilam-Tzoreff and U. Vishkin. Matching patterns in strings subject to multi-linear transformations. Theoretical Computer Science, 60(3):231–254, 1988.
M. Farach and M. Thorup. String-matching in Lempel-Ziv compressed strings. In Proc. 27th Ann. ACM Symp. on Theory of Computing, pages 703–713, 1995.
S. Fukamachi, T. Shinohara, and M. Takeda. String pattern matching for compressed data using variable length codes. Submitted, 1998.
L. Gąsieniec, M. Karpinski, W. Plandowski, and W. Rytter. Efficient algorithms for Lempel-Ziv encoding. In Proc. 4th Scandinavian Workshop on Algorithm Theory, volume 1097 of Lecture Notes in Computer Science, pages 392–403. Springer-Verlag, 1996.
M. Karpinski, W. Rytter, and A. Shinohara. An efficient pattern-matching algorithm for strings with short descriptions. Nordic Journal of Computing, 4:172–186, 1997.
T. Kida, M. Takeda, A. Shinohara, and S. Arikawa. Shift-And approach to pattern matching in LZW compressed text. In Proc. 10th Ann. Symp. on Combinatorial Pattern Matching, Lecture Notes in Computer Science. Springer-Verlag, 1999. To appear.
T. Kida, M. Takeda, A. Shinohara, M. Miyazaki, and S. Arikawa. Multiple pattern matching in LZW compressed text. In Proc. Data Compression Conference’ 98, pages 103–112. IEEE Computer Society, 1998.
U. Manber. A text compression scheme that allows fast searching directly in the compressed file. In Proc. 5th Ann. Symp. on Combinatorial Pattern Matching, volume 807 of Lecture Notes in Computer Science, pages 113–124. Springer-Verlag, 1994.
M. Miyazaki, S. Fukamachi, M. Takeda, and T. Shinohara. Speeding up the pattern matching machine for compressed texts. Transactions of Information Processing Society of Japan, 39(9):2638–2648, 1998. (in Japanese).
M. Miyazaki, A. Shinohara, and M. Takeda. An improved pattern matching algorithm for strings in terms of straight-line programs. In Proc. 8th Ann. Symp. on Combinatorial Pattern Matching, volume 1264 of Lecture Notes in Computer Science, pages 1–11. Springer-Verlag, 1997.
Y. Shibata, T. Kida, S. Fukamachi, M. Takeda, A. Shinohara, T. Shinohara, and S. Arikawa. Byte pair encoding: a text compression scheme that accelerates pattern matching. Technical Report DOI-TR-161, Department of Informatics, Kyushu University, April 1999.
M. Takeda. Pattern matching machine for text compressed using finite state model. Technical Report DOI-TR-142, Department of Informatics, Kyushu University, October 1997.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shibata, Y., Takeda, M., Shinohara, A., Arikawa, S. (1999). Pattern Matching in Text Compressed by Using Antidictionaries. In: Crochemore, M., Paterson, M. (eds) Combinatorial Pattern Matching. CPM 1999. Lecture Notes in Computer Science, vol 1645. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48452-3_3
Download citation
DOI: https://doi.org/10.1007/3-540-48452-3_3
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66278-5
Online ISBN: 978-3-540-48452-3
eBook Packages: Springer Book Archive