Skip to main content

On Hardness of Jumbled Indexing

  • Conference paper
Automata, Languages, and Programming (ICALP 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8572))

Included in the following conference series:

Abstract

Jumbled indexing is the problem of indexing a text T for queries that ask whether there is a substring of T matching a pattern represented as a Parikh vector, i.e., the vector of frequency counts for each character. Jumbled indexing has garnered a lot of interest in the last four years; for a partial list see [2,6,13,16,17,20,22,24,26,30,35,36]. There is a naive algorithm that preprocesses all answers in O(n 2|Σ|) time allowing quick queries afterwards, and there is another naive algorithm that requires no preprocessing but has O(nlog|Σ|) query time. Despite a tremendous amount of effort there has been little improvement over these running times.

In this paper we provide good reason for this. We show that, under a 3SUM-hardness assumption, jumbled indexing for alphabets of size ω(1) requires Ω(n 2 − ε) preprocessing time or Ω(n 1 − δ) query time for any ε,δ > 0. In fact, under a stronger 3SUM-hardness assumption, for any constant alphabet size r ≥ 3 there exist describable fixed constant ε r and δ r such that jumbled indexing requires \(\Omega(n^{2-\epsilon_r})\) preprocessing time or \(\Omega(n^{1-\delta_r})\) query time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amir, A., Apostolico, A., Landau, G.M., Satta, G.: Efficient text fingerprinting via Parikh mapping. J. Discrete Algorithms 1(5-6), 409–421 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  2. Amir, A., Butman, A., Porat, E.: On the relationship between histogram indexing and block-mass indexing. In: Philosophical Transactions A (to appear)

    Google Scholar 

  3. Amir, A., Church, K.W., Dar, E.: Separable attributes: a technique for solving the sub matrices character count problem. In: SODA, pp. 400–401 (2002)

    Google Scholar 

  4. Amir, A., Farach, M., Muthukrishnan, S.: Alphabet dependence in parameterized matching. Inf. Process. Lett. 49(3), 111–115 (1994)

    Article  MATH  Google Scholar 

  5. Phanendra Babu, G., Mehtre, B.M., Kankanhalli, M.S.: Color indexing for efficient image retrieval. Multimedia Tools and Applications 1(4), 327–348 (1995)

    Article  Google Scholar 

  6. Badkobeh, G., Fici, G., Kroon, S., Lipták, Z.: Binary jumbled string matching for highly run-length compressible texts. Inf. Process. Lett. 113(17), 604–608 (2013)

    Article  MATH  Google Scholar 

  7. Baker, B.S.: Parameterized pattern matching: Algorithms and applications. J. Comput. Syst. Sci. 52(1), 28–42 (1996)

    Article  MATH  Google Scholar 

  8. Baker, B.S.: Parameterized duplication in strings: Algorithms and an application to software maintenance. SIAM J. Comput. 26(5), 1343–1362 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  9. Baran, I., Demaine, E.D., Pǎtraşcu, M.: Subquadratic algorithms for 3SUM. In: Dehne, F., López-Ortiz, A., Sack, J.-R. (eds.) WADS 2005. LNCS, vol. 3608, pp. 409–421. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  10. Böcker, S.: Simulating multiplexed SNP discovery rates using base-specific cleavage and mass spectrometry. Bioinformatics 23(2), 5–12 (2007)

    Article  Google Scholar 

  11. Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Commun. ACM 20(10), 762–772 (1977)

    Article  MATH  Google Scholar 

  12. Bremner, D., Chan, T.M., Demaine, E.D., Erickson, J., Hurtado, F., Iacono, J., Langerman, S., Pătraşcu, M., Taslakian, P.: Necklaces, convolutions, and X + Y. Algorithmica 69, 294–314 (2014)

    Article  MathSciNet  Google Scholar 

  13. Burcsi, P., Cicalese, F., Fici, G., Lipták, Z.: On table arrangements, scrabble freaks, and jumbled pattern matching. In: Boldi, P. (ed.) FUN 2010. LNCS, vol. 6099, pp. 89–101. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  14. Butman, A., Eres, R., Landau, G.M.: Scaled and permuted string matching. Inf. Process. Lett. 92(6), 293–297 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  15. Butman, A., Lewenstein, N., Munro, I.J.: Permuted scaled matching. In: CPM 2014 (to appear, 2014)

    Google Scholar 

  16. Cicalese, F., Fici, G., Lipták, Z.: Searching for jumbled patterns in strings. In: Prague Stringology Conference, pp. 105–117 (2009)

    Google Scholar 

  17. Cicalese, F., Laber, E.S., Weimann, O., Yuster, R.: Near linear time construction of an approximate index for all maximum consecutive sub-sums of a sequence. In: Kärkkäinen, J., Stoye, J. (eds.) CPM 2012. LNCS, vol. 7354, pp. 149–158. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  18. Cole, R., Gottlieb, L.-A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: STOC, pp. 91–100 (2004)

    Google Scholar 

  19. Crochemore, M., Iliopoulos, C.S., Kociumaka, T., Kubica, M., Langiu, A., Pissis, S.P., Radoszewski, J., Rytter, W., Waleń, T.: Order-preserving incomplete suffix trees and order-preserving indexes. In: Kurland, O., Lewenstein, M., Porat, E. (eds.) SPIRE 2013. LNCS, vol. 8214, pp. 84–95. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  20. Durocher, S., Ian Munro, J., Mondal, D., Thankachan, S.V.: Jumbled pattern matching over large alphabets (2014) (manuscript, personal communication)

    Google Scholar 

  21. Eres, R., Landau, G.M., Parida, L.: Permutation pattern discovery in biosequences. Journal of Computational Biology 11(6), 1050–1060 (2004)

    Article  Google Scholar 

  22. Gagie, T., Hermelin, D., Landau, G.M., Weimann, O.: Binary jumbled pattern matching on trees and tree-like structures. In: Bodlaender, H.L., Italiano, G.F. (eds.) ESA 2013. LNCS, vol. 8125, pp. 517–528. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  23. Gajentaan, A., Overmars, M.H.: On a class of O(n 2) problems in computational geometry. Comput. Geom. 5, 165–185 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  24. Giaquinta, E., Grabowski, S.: New algorithms for binary jumbled pattern matching. Inf. Process. Lett. 113(14-16), 538–542 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  25. Hazay, C., Lewenstein, M., Sokol, D.: Approximate parameterized matching. ACM Transactions on Algorithms 3(3) (2007)

    Google Scholar 

  26. Hermelin, D., Landau, G.M., Rabinovich, Y., Weimann, O.: Binary jumbled pattern matching via all-pairs shortest paths (2014) (manuscript), http://arxiv.org/abs/1401.2065

  27. Holub, S.: Parikh test sets for commutative languages. ITA 42(3), 525–537 (2008)

    MATH  MathSciNet  Google Scholar 

  28. Huang, X., Ali, H., Sadanandam, A., Singh, R.: SRPVS: a new motif searching algorithm for protein analysis. In: Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference, CSB 2004, pp. 674–675 (2004)

    Google Scholar 

  29. Knuth, D.E., Morris Jr., J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)

    Article  MATH  MathSciNet  Google Scholar 

  30. Kociumaka, T., Radoszewski, J., Rytter, W.: Efficient indexes for jumbled pattern matching with constant-sized alphabet. In: Bodlaender, H.L., Italiano, G.F. (eds.) ESA 2013. LNCS, vol. 8125, pp. 625–636. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  31. Kopczynski, E., Widjaja, A.: Parikh images of grammars: Complexity and applications. In: LICS, pp. 80–89 (2010)

    Google Scholar 

  32. Kubica, M., Kulczynski, T., Radoszewski, J., Rytter, W., Walen, T.: A linear time algorithm for consecutive permutation pattern matching. Inf. Process. Lett. 113(12), 430–433 (2013)

    Article  MathSciNet  Google Scholar 

  33. Lee, L.-K., Lewenstein, M., Zhang, Q.: Parikh matching in the streaming model. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 336–341. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  34. Manber, U., Myers, E.W.: Suffix arrays: A new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  35. Moosa, T.M., Rahman, M.S.: Indexing permutations for binary strings. Inf. Process. Lett. 110(18-19), 795–798 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  36. Moosa, T.M., Rahman, M.S.: Sub-quadratic time and linear space data structures for permutation matching in binary strings. J. Discrete Algorithms 10, 5–9 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  37. Parikh, R.: On context-free languages. J. ACM 13(4), 570–581 (1966)

    Article  MATH  MathSciNet  Google Scholar 

  38. Pătraşcu, M.: Towards polynomial lower bounds for dynamic problems. In: STOC, pp. 603–610 (2010)

    Google Scholar 

  39. Swain, M.J., Ballard, D.H.: Color indexing. International Journal of Computer Vision 7(1), 11–32 (1991)

    Article  Google Scholar 

  40. Weiner, P.: Linear pattern matching algorithms. In: SWAT (FOCS), pp. 1–11 (1973)

    Google Scholar 

  41. Williams, R.: Faster all-pairs shortest paths via circuit complexity. In: STOC (to appear, 2014)

    Google Scholar 

  42. Williams, V.V., Williams, R.: Subcubic equivalences between path, matrix and triangle problems. In: FOCS, pp. 645–654 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Amir, A., Chan, T.M., Lewenstein, M., Lewenstein, N. (2014). On Hardness of Jumbled Indexing. In: Esparza, J., Fraigniaud, P., Husfeldt, T., Koutsoupias, E. (eds) Automata, Languages, and Programming. ICALP 2014. Lecture Notes in Computer Science, vol 8572. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43948-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-43948-7_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-43947-0

  • Online ISBN: 978-3-662-43948-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics