ABSTRACT
Succinct data structures are used today in many information retrieval applications, e.g., posting lists representation, language model representation, indexing (social) graphs, query auto-completion, document retrieval and indexing dictionary of strings, just to mention the most recent ones. These new kind of data structures mimic the operations of their classical counterparts within a comparable time complexity but require much less space. With the availability of several libraries for basic succinct structures - like SDSL, Succinct, Facebook?s Folly, and Sux - it is relatively easy to directly profit from advances in this field. In this tutorial we will introduce this field of research by presenting the most important succinct data structures to represent set of integers, set of points, trees, graphs and strings together with their most important applications to Information Retrieval problems. The introduction of the succinct data structures will be sustained with a practical session with programming handouts to solve. This will allow the attendees to directly experiment with implementations of these solutions on real datasets and understand the potential benefits they can bring on their own projects.
- J. Barbay. Succinct and compressed data structures for permutations and integer functions. In Encyclopedia of Algorithms. 2015.Google Scholar
- J. Barbay and J. I. Munro. Succinct encoding of permutations: Applications to text indexing. In Encyclopedia of Algorithms. 2008.Google ScholarCross Ref
- D. Benoit, E. Demaine, J. I. Munro, R. Raman, V.Raman, and S. S. Rao. Representing trees of higher degree. Algorithmica, 43(4):275--292, 2005.Google ScholarCross Ref
- P. Elias. Efficient storage and retrieval by content and address of static files. Journal of the ACM, 21:246--260, 1974. Google ScholarDigital Library
- R. M. Fano. On the number of bits required to implement anassociative memory. Memorandum 61, Computer Structures Group, Project MAC, 1971.Google Scholar
- P. Ferragina, R. González, G. Navarro, and R. Venturini. Compressed text indexes: From theory to practice. ACM Journal of Experimental Algorithmics, 13, 2008. Google ScholarDigital Library
- P. Ferragina and G. Manzini. Indexing compressed text. Journal of the ACM, 52(4):552--581, 2005. Google ScholarDigital Library
- P. Ferragina, F. Piccinno, and R. Venturini. Compressed indexes for string-searching in labeled graphs. In Proceedings of the 24th International Conference on World Wide Web (WWW), pages --, 2015. Google ScholarDigital Library
- P. Ferragina and S. S. Rao. Tree compression and indexing. In Encyclopedia of Algorithms. 2008.Google ScholarCross Ref
- P. Ferragina and R. Venturini. Indexing compressed text. In Encyclopedia of Database Systems, pages 1442--1448. 2009.Google ScholarCross Ref
- P. Ferragina and R. Venturini. The compressed permuterm index. ACM Transactions on Algorithms, 7(1):10, 2010. Google ScholarDigital Library
- J. Fischer and V. Heun. Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM Journal on Computing, 40(2):465--492, 2011. Google ScholarDigital Library
- L. Foschini, R. Grossi, A. Gupta, and J. S. Vitter. When indexing equals compression: Experiments with compressing suffix arrays and applications. ACM Transactions on Algorithms, 2(4):611--639, 2006. Google ScholarDigital Library
- S. Gog, T. Beller, A. Moffat, and M. Petri. From theory to practice: Plug and play with succinct data structures. In Proceedings of the 13th International Symposium Experimental Algorithms (SEA), pages 326--337, 2014. Google ScholarDigital Library
- S. Gog and G. Navarro. Improved single-term top-phk document retrieval. In Proceedings of the Seventeenth Workshop on Algorithm Engineering and Experiments, ALENEX, pages 24--32, 2015. Google ScholarDigital Library
- S. Gog and M. Petri. Compact indexes for flexible top-k retrieval. In Combinatorial Pattern Matching - 26th Annual Symposium, CPM, pages 207--218, 2015.Google Scholar
- R. Grossi, A. Gupta, and J. S. Vitter. High-order entropy-compressed text indexes. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 841--850, 2003. Google ScholarDigital Library
- R. Grossi and G. Ottaviano. Fast compressed tries through path decompositions. ACM Journal of Experimental Algorithmics, 19(1), 2014. Google ScholarDigital Library
- W. Hon, R. Shah, and J. S. Vitter. Space-efficient framework for top-k string retrieval problems. In 50th Annual IEEE Symposium on Foundations of Computer Science, FOCS, pages 713--722, 2009. Google ScholarDigital Library
- W. Hon, R. Shah, and J. S. Vitter. Space-efficient framework for top-k string retrieval problems. In Proceedings of the 50th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 713--722, 2009. Google ScholarDigital Library
- B. P. Hsu and G. Ottaviano. Space-efficient data structures for top-phk completion. In Proceedings of the 22nd International World Wide Web Conference (WWW), pages 583--594, 2013. Google ScholarDigital Library
- R. Konow, G. Navarro, C. L. A. Clarke, and A. López-Ortiz. Faster and smaller inverted indices with treaps. In Proceedings of the 36th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pages 193--202, 2013. Google ScholarDigital Library
- G. Navarro. Spaces, trees and colors: The algorithmic landscape of document retrieval on sequences. ACM Computing Surveys, 46(4):article 52, 2014. 47 pages. Google ScholarDigital Library
- G. Navarro. Wavelet trees for all. Journal Discrete Algorithms, 25:2--20, 2014. Google ScholarDigital Library
- G. Navarro and V. Mäkinen. Compressed full text indexes. ACM Computing Surveys, 39(1), 2007. Google ScholarDigital Library
- G. Navarro and Y. Nekrich. Top-phk document retrieval in optimal time and linear space. In Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 1066--1077, 2012. Google ScholarDigital Library
- D. Okanohara and K. Sadakane. Practical entropy-compressed rank/select dictionary. In Proceedings of the Nine Workshop on Algorithm Engineering and Experiments, ALENEX, 2007. Google ScholarDigital Library
- G. Ottaviano, N. Tonellotto, and R. Venturini. Optimal space-time tradeoffs for inverted indexes. In Proceedings of the 8th Annual International ACM Conference on Web Search and Data Mining (WSDM), pages --, 2015. Google ScholarDigital Library
- G. Ottaviano and R. Venturini. Partitioned elias-fano indexes. In Proceedings of the 37th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pages 273--282, 2014. Google ScholarDigital Library
- M. Patil, S. V. Thankachan, R. Shah, W. Hon, J. S. Vitter, and S. Chandrasekaran. Inverted indexes for phrases and strings. In Proceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR, pages 555--564, 2011. Google ScholarDigital Library
- N. Rahman and R. Raman. Rank and select operations on binary strings. In Encyclopedia of Algorithms. 2008.Google ScholarCross Ref
- K. Sadakane. Succinct data structures for flexible text retrieval systems. J. Discrete Algorithms, 5(1):12--22, 2007. Google ScholarDigital Library
- K. Sadakane and G. Navarro. Fully-functional succinct trees. In Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 134--149, 2010. Google ScholarDigital Library
- E. Shareghi, M. Petri, G. Haffari, and T. Cohn. Compact, efficient and unlimited capacity: Language modeling with compressed suffix trees. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP, pages 2409--2418, 2015.Google ScholarCross Ref
- S. Vigna. Quasi-succinct indices. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining (WSDM), pages 83--92, 2013. Google ScholarDigital Library
- S. Yata. Marisa trie, https://code.google.com/archive/p/marisa-trie/. 2011.Google Scholar
Index Terms
- Succinct Data Structures in Information Retrieval: Theory and Practice
Recommendations
Succinct Data Structures ... Potential for Symbolic Computation?
ISSAC '16: Proceedings of the ACM on International Symposium on Symbolic and Algebraic ComputationWe focus on succinct data structures, that is on time and space efficient representations of trees and other combinatorial objects that dominate the memory requirements of most sophisticated programs and systems.
Information Retrieval IR and Extracting Associative Rules
This paper is located in the intersection of two research themes, namely: Information Retrieval and Knowledge Discovery from texts Text mining. The purpose of this paper is two-fold: first, it focuses on Information Retrieval IR whose purpose is to ...
Multimedia retrieval by means of merge of results from textual and content based retrieval subsystems
CLEF'09: Proceedings of the 10th international conference on Cross-language evaluation forum: multimedia experimentsThe main goal of this paper it is to present our experiments in ImageCLEF 2009 Campaign (photo retrieval task). In 2008 we proved empirically that the Text-based Image Retrieval (TBIR) methods defeats the Content-based Image Retrieval CBIR "quality" of ...
Comments