On Hardness of Jumbled Indexing

Amir, Amihood; Chan, Timothy M.; Lewenstein, Moshe; Lewenstein, Noa

doi:10.1007/978-3-662-43948-7_10

Amihood Amir^19,20,
Timothy M. Chan²¹,
Moshe Lewenstein¹⁹ &
…
Noa Lewenstein²²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8572))

Included in the following conference series:

International Colloquium on Automata, Languages, and Programming

2707 Accesses
22 Citations

Abstract

Jumbled indexing is the problem of indexing a text T for queries that ask whether there is a substring of T matching a pattern represented as a Parikh vector, i.e., the vector of frequency counts for each character. Jumbled indexing has garnered a lot of interest in the last four years; for a partial list see [2,6,13,16,17,20,22,24,26,30,35,36]. There is a naive algorithm that preprocesses all answers in O(n ²|Σ|) time allowing quick queries afterwards, and there is another naive algorithm that requires no preprocessing but has O(nlog|Σ|) query time. Despite a tremendous amount of effort there has been little improvement over these running times.

In this paper we provide good reason for this. We show that, under a 3SUM-hardness assumption, jumbled indexing for alphabets of size ω(1) requires Ω(n ^2 − ε) preprocessing time or Ω(n ^1 − δ) query time for any ε,δ > 0. In fact, under a stronger 3SUM-hardness assumption, for any constant alphabet size r ≥ 3 there exist describable fixed constant ε _r and δ _r such that jumbled indexing requires \(\Omega(n^{2-\epsilon_r})\) preprocessing time or \(\Omega(n^{1-\delta_r})\) query time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Amir, A., Apostolico, A., Landau, G.M., Satta, G.: Efficient text fingerprinting via Parikh mapping. J. Discrete Algorithms 1(5-6), 409–421 (2003)
Article MATH MathSciNet Google Scholar
Amir, A., Butman, A., Porat, E.: On the relationship between histogram indexing and block-mass indexing. In: Philosophical Transactions A (to appear)
Google Scholar
Amir, A., Church, K.W., Dar, E.: Separable attributes: a technique for solving the sub matrices character count problem. In: SODA, pp. 400–401 (2002)
Google Scholar
Amir, A., Farach, M., Muthukrishnan, S.: Alphabet dependence in parameterized matching. Inf. Process. Lett. 49(3), 111–115 (1994)
Article MATH Google Scholar
Phanendra Babu, G., Mehtre, B.M., Kankanhalli, M.S.: Color indexing for efficient image retrieval. Multimedia Tools and Applications 1(4), 327–348 (1995)
Article Google Scholar
Badkobeh, G., Fici, G., Kroon, S., Lipták, Z.: Binary jumbled string matching for highly run-length compressible texts. Inf. Process. Lett. 113(17), 604–608 (2013)
Article MATH Google Scholar
Baker, B.S.: Parameterized pattern matching: Algorithms and applications. J. Comput. Syst. Sci. 52(1), 28–42 (1996)
Article MATH Google Scholar
Baker, B.S.: Parameterized duplication in strings: Algorithms and an application to software maintenance. SIAM J. Comput. 26(5), 1343–1362 (1997)
Article MATH MathSciNet Google Scholar
Baran, I., Demaine, E.D., Pǎtraşcu, M.: Subquadratic algorithms for 3SUM. In: Dehne, F., López-Ortiz, A., Sack, J.-R. (eds.) WADS 2005. LNCS, vol. 3608, pp. 409–421. Springer, Heidelberg (2005)
Chapter Google Scholar
Böcker, S.: Simulating multiplexed SNP discovery rates using base-specific cleavage and mass spectrometry. Bioinformatics 23(2), 5–12 (2007)
Article Google Scholar
Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Commun. ACM 20(10), 762–772 (1977)
Article MATH Google Scholar
Bremner, D., Chan, T.M., Demaine, E.D., Erickson, J., Hurtado, F., Iacono, J., Langerman, S., Pătraşcu, M., Taslakian, P.: Necklaces, convolutions, and X + Y. Algorithmica 69, 294–314 (2014)
Article MathSciNet Google Scholar
Burcsi, P., Cicalese, F., Fici, G., Lipták, Z.: On table arrangements, scrabble freaks, and jumbled pattern matching. In: Boldi, P. (ed.) FUN 2010. LNCS, vol. 6099, pp. 89–101. Springer, Heidelberg (2010)
Chapter Google Scholar
Butman, A., Eres, R., Landau, G.M.: Scaled and permuted string matching. Inf. Process. Lett. 92(6), 293–297 (2004)
Article MATH MathSciNet Google Scholar
Butman, A., Lewenstein, N., Munro, I.J.: Permuted scaled matching. In: CPM 2014 (to appear, 2014)
Google Scholar
Cicalese, F., Fici, G., Lipták, Z.: Searching for jumbled patterns in strings. In: Prague Stringology Conference, pp. 105–117 (2009)
Google Scholar
Cicalese, F., Laber, E.S., Weimann, O., Yuster, R.: Near linear time construction of an approximate index for all maximum consecutive sub-sums of a sequence. In: Kärkkäinen, J., Stoye, J. (eds.) CPM 2012. LNCS, vol. 7354, pp. 149–158. Springer, Heidelberg (2012)
Chapter Google Scholar
Cole, R., Gottlieb, L.-A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: STOC, pp. 91–100 (2004)
Google Scholar
Crochemore, M., Iliopoulos, C.S., Kociumaka, T., Kubica, M., Langiu, A., Pissis, S.P., Radoszewski, J., Rytter, W., Waleń, T.: Order-preserving incomplete suffix trees and order-preserving indexes. In: Kurland, O., Lewenstein, M., Porat, E. (eds.) SPIRE 2013. LNCS, vol. 8214, pp. 84–95. Springer, Heidelberg (2013)
Chapter Google Scholar
Durocher, S., Ian Munro, J., Mondal, D., Thankachan, S.V.: Jumbled pattern matching over large alphabets (2014) (manuscript, personal communication)
Google Scholar
Eres, R., Landau, G.M., Parida, L.: Permutation pattern discovery in biosequences. Journal of Computational Biology 11(6), 1050–1060 (2004)
Article Google Scholar
Gagie, T., Hermelin, D., Landau, G.M., Weimann, O.: Binary jumbled pattern matching on trees and tree-like structures. In: Bodlaender, H.L., Italiano, G.F. (eds.) ESA 2013. LNCS, vol. 8125, pp. 517–528. Springer, Heidelberg (2013)
Chapter Google Scholar
Gajentaan, A., Overmars, M.H.: On a class of O(n ²) problems in computational geometry. Comput. Geom. 5, 165–185 (1995)
Article MATH MathSciNet Google Scholar
Giaquinta, E., Grabowski, S.: New algorithms for binary jumbled pattern matching. Inf. Process. Lett. 113(14-16), 538–542 (2013)
Article MATH MathSciNet Google Scholar
Hazay, C., Lewenstein, M., Sokol, D.: Approximate parameterized matching. ACM Transactions on Algorithms 3(3) (2007)
Google Scholar
Hermelin, D., Landau, G.M., Rabinovich, Y., Weimann, O.: Binary jumbled pattern matching via all-pairs shortest paths (2014) (manuscript), http://arxiv.org/abs/1401.2065
Holub, S.: Parikh test sets for commutative languages. ITA 42(3), 525–537 (2008)
MATH MathSciNet Google Scholar
Huang, X., Ali, H., Sadanandam, A., Singh, R.: SRPVS: a new motif searching algorithm for protein analysis. In: Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference, CSB 2004, pp. 674–675 (2004)
Google Scholar
Knuth, D.E., Morris Jr., J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)
Article MATH MathSciNet Google Scholar
Kociumaka, T., Radoszewski, J., Rytter, W.: Efficient indexes for jumbled pattern matching with constant-sized alphabet. In: Bodlaender, H.L., Italiano, G.F. (eds.) ESA 2013. LNCS, vol. 8125, pp. 625–636. Springer, Heidelberg (2013)
Chapter Google Scholar
Kopczynski, E., Widjaja, A.: Parikh images of grammars: Complexity and applications. In: LICS, pp. 80–89 (2010)
Google Scholar
Kubica, M., Kulczynski, T., Radoszewski, J., Rytter, W., Walen, T.: A linear time algorithm for consecutive permutation pattern matching. Inf. Process. Lett. 113(12), 430–433 (2013)
Article MathSciNet Google Scholar
Lee, L.-K., Lewenstein, M., Zhang, Q.: Parikh matching in the streaming model. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 336–341. Springer, Heidelberg (2012)
Chapter Google Scholar
Manber, U., Myers, E.W.: Suffix arrays: A new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)
Article MATH MathSciNet Google Scholar
Moosa, T.M., Rahman, M.S.: Indexing permutations for binary strings. Inf. Process. Lett. 110(18-19), 795–798 (2010)
Article MATH MathSciNet Google Scholar
Moosa, T.M., Rahman, M.S.: Sub-quadratic time and linear space data structures for permutation matching in binary strings. J. Discrete Algorithms 10, 5–9 (2012)
Article MATH MathSciNet Google Scholar
Parikh, R.: On context-free languages. J. ACM 13(4), 570–581 (1966)
Article MATH MathSciNet Google Scholar
Pătraşcu, M.: Towards polynomial lower bounds for dynamic problems. In: STOC, pp. 603–610 (2010)
Google Scholar
Swain, M.J., Ballard, D.H.: Color indexing. International Journal of Computer Vision 7(1), 11–32 (1991)
Article Google Scholar
Weiner, P.: Linear pattern matching algorithms. In: SWAT (FOCS), pp. 1–11 (1973)
Google Scholar
Williams, R.: Faster all-pairs shortest paths via circuit complexity. In: STOC (to appear, 2014)
Google Scholar
Williams, V.V., Williams, R.: Subcubic equivalences between path, matrix and triangle problems. In: FOCS, pp. 645–654 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Bar-Ilan University, Israel
Amihood Amir & Moshe Lewenstein
Johns Hopkins University, USA
Amihood Amir
University of Waterloo, Canada
Timothy M. Chan
Netanya College, Israel
Noa Lewenstein

Authors

Amihood Amir
View author publications
You can also search for this author in PubMed Google Scholar
Timothy M. Chan
View author publications
You can also search for this author in PubMed Google Scholar
Moshe Lewenstein
View author publications
You can also search for this author in PubMed Google Scholar
Noa Lewenstein
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Informatik, Technische Universität München, Boltzmannstrasse 3, 85748, München, Germany
Javier Esparza
LIAFA, Université Paris Diderot-Paris 7, Case 7014, 75205, Paris Cedex 13, France
Pierre Fraigniaud
IT University of Copenhagen, Rued Langgaards Vej 7, 2300, Copenhagen, Denmark
Thore Husfeldt
Department Computer Science, University of Oxford, Wolfson Building, Parks Road, OX1 3QD, Oxford, UK
Elias Koutsoupias

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Amir, A., Chan, T.M., Lewenstein, M., Lewenstein, N. (2014). On Hardness of Jumbled Indexing. In: Esparza, J., Fraigniaud, P., Husfeldt, T., Koutsoupias, E. (eds) Automata, Languages, and Programming. ICALP 2014. Lecture Notes in Computer Science, vol 8572. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43948-7_10

Download citation

DOI: https://doi.org/10.1007/978-3-662-43948-7_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-43947-0
Online ISBN: 978-3-662-43948-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics