Skip to main content

Top-k Document Retrieval in Compact Space and Near-Optimal Time

  • Conference paper
Algorithms and Computation (ISAAC 2013)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8283))

Included in the following conference series:

Abstract

Let \(\cal{D}\)= {d 1,d 2,...d D } be a given set of D string documents of total length n. Our task is to index \(\cal{D}\) such that the k most relevant documents for an online query pattern P of length p can be retrieved efficiently. There exist linear space data structures of O(n) words for answering such queries in optimal O(p + k) time. In this paper, we describe a compact index of size |CSA|+nlogD + o(nlogD) bits with near optimal time, O(p + klog* n), for the basic relevance metric term-frequency, where |CSA| is the size (in bits) of a compressed full-text index of \(\cal{D}\), and log* n is the iterated logarithm of n.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Belazzougui, D., Navarro, G.: Alphabet-independent compressed text indexing. In: Demetrescu, C., Halldórsson, M.M. (eds.) ESA 2011. LNCS, vol. 6942, pp. 748–759. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  2. Belazzougui, D., Navarro, G.: New lower and upper bounds for representing sequences. In: Epstein, L., Ferragina, P. (eds.) ESA 2012. LNCS, vol. 7501, pp. 181–192. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  3. Belazzougui, D., Navarro, G., Valenzuela, D.: Improved compressed indexes for full-text document retrieval. J. Discr. Alg. 18, 3–13 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  4. Blum, M., Floyd, R.W., Pratt, V.R., Rivest, R.L., Tarjan, R.E.: Time bounds for selection. J. Comp. Sys. Sci. 7(4), 448–461 (1973)

    Article  MathSciNet  MATH  Google Scholar 

  5. Büttcher, S., Clarke, C., Cormack, G.: Information Retrieval: Implementing and Evaluating Search Engines. MIT Press (2010)

    Google Scholar 

  6. Culpepper, J.S., Navarro, G., Puglisi, S.J., Turpin, A.: Top-k ranked document search in general text databases. In: de Berg, M., Meyer, U. (eds.) ESA 2010, Part II. LNCS, vol. 6347, pp. 194–205. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  7. Gagie, T., Kärkkäinen, J., Navarro, G., Puglisi, S.J.: Colored range queries and document retrieval. Theoretical Computer Science 483, 36–50 (2013)

    Article  MathSciNet  Google Scholar 

  8. Grossi, R., Iacono, J., Navarro, G., Raman, R., Rao, S.S.: Encodings for range selection and top-k queries. In: Bodlaender, H.L., Italiano, G.F. (eds.) ESA 2013. LNCS, vol. 8125, pp. 553–564. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  9. Hon, W.-K., Patil, M., Shah, R., Wu, S.-B.: Efficient index for retrieving top-k most frequent documents. J. Discr. Alg. 8(4), 402–417 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  10. Hon, W.-K., Shah, R., Thankachan, S.V., Vitter, J.S.: Document listing for queries with excluded pattern. In: Kärkkäinen, J., Stoye, J. (eds.) CPM 2012. LNCS, vol. 7354, pp. 185–195. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  11. Hon, W.-K., Shah, R., Thankachan, S., Vitter, J.: Faster compressed top-k document retrieval. In: Proc. 23rd DCC, pp. 341–350 (2013)

    Google Scholar 

  12. Hon, W.-K., Shah, R., Vitter, J.: Space-efficient framework for top-k string retrieval problems. In: Proc. 50th FOCS, pp. 713–722 (2009)

    Google Scholar 

  13. Hon, W.-K., Shah, R., Wu, S.-B.: Efficient index for retrieving top-k most frequent documents. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 182–193. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  14. Konow, R., Navarro, G.: Faster compact top-k document retrieval. In: Proc. 23rd DCC, pp. 351–360 (2013)

    Google Scholar 

  15. Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM J. Comp. 22(5), 935–948 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  16. Muthukrishnan, S.: Efficient algorithms for document retrieval problems. In: Proc 13th SODA, pp. 657–666 (2002)

    Google Scholar 

  17. Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comp. Surv. 39(1), art. 2 (2007)

    Google Scholar 

  18. Navarro, G., Nekrich, Y.: Top-k document retrieval in optimal time and linear space. In: Proc. 23rd SODA, pp. 1066–1078 (2012)

    Google Scholar 

  19. Navarro, G., Puglisi, S.J., Valenzuela, D.: Practical compressed document retrieval. In: Pardalos, P.M., Rebennack, S. (eds.) SEA 2011. LNCS, vol. 6630, pp. 193–205. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  20. Navarro, G., Thankachan, S.V.: Faster top-k document retrieval in optimal space. In: Kurland, O., Lewenstein, M., Porat, E. (eds.) SPIRE 2013. LNCS, vol. 8214, pp. 255–262. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  21. Navarro, G., Valenzuela, D.: Space-efficient top-k document retrieval. In: Klasing, R. (ed.) SEA 2012. LNCS, vol. 7276, pp. 307–319. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  22. Okanohara, D., Sadakane, K.: Practical entropy-compressed rank/select dictionary. In: Proc. 9th ALENEX (2007)

    Google Scholar 

  23. Raman, R., Raman, V., Rao, S.S.: Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Trans. Alg. 3(4), art. 43 (2007)

    Google Scholar 

  24. Sadakane, K.: Succinct data structures for flexible text retrieval systems. J. Discr. Alg. 5, 12–22 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  25. Sadakane, K., Navarro, G.: Fully-functional succinct trees. In: Proc. 21st SODA, pp. 134–149 (2010)

    Google Scholar 

  26. Shah, R., Sheng, C., Thankachan, S.V., Vitter, J.S.: Top-k document retrieval in external memory. In: Bodlaender, H.L., Italiano, G.F. (eds.) ESA 2013. LNCS, vol. 8125, pp. 803–814. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  27. Tsur, D.: Top-k document retrieval in optimal space. Inf. Proc. Lett. 113(12), 440–443 (2013)

    Article  MathSciNet  Google Scholar 

  28. Välimäki, N., Mäkinen, V.: Space-efficient algorithms for document retrieval. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 205–215. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  29. Weiner, P.: Linear pattern matching algorithm. In: Proc. 14th Annual IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Navarro, G., Thankachan, S.V. (2013). Top-k Document Retrieval in Compact Space and Near-Optimal Time. In: Cai, L., Cheng, SW., Lam, TW. (eds) Algorithms and Computation. ISAAC 2013. Lecture Notes in Computer Science, vol 8283. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45030-3_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-45030-3_37

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-45029-7

  • Online ISBN: 978-3-642-45030-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics