Skip to main content

Vec2Doc: Transforming Dense Vectors into Sparse Representations for Efficient Information Retrieval

  • Conference paper
  • First Online:
Similarity Search and Applications (SISAP 2023)

Abstract

The rapid development of deep learning and artificial intelligence has transformed our approach to solving scientific problems across various domains, including computer vision, natural language processing, and automatic content generation. Information retrieval (IR) has also experienced significant advancements, with natural language understanding and multimodal content analysis enabling accurate information retrieval. However, the widespread adoption of neural networks has also influenced the focus of IR problem-solving, which nowadays predominantly relies on evaluating the similarity of dense vectors derived from the latent spaces of deep neural networks. Nevertheless, the challenges of conducting similarity searches on large-scale databases with billions of vectors persist. Traditional IR approaches use inverted indices and vector space models, which work well with sparse vectors. In this paper, we propose Vec2Doc, a novel method that converts dense vectors into sparse integer vectors, allowing for the use of inverted indices. Preliminary experimental evaluation shows a promising solution for large-scale vector-based IR problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Amato, G., Bolettieri, P., Carrara, F., Falchi, F., Gennaro, C.: Large-scale image retrieval with elasticsearch. In: The 41st International ACM SIGIR Conference on Research Development in Information Retrieval, pp. 925–928 (2018)

    Google Scholar 

  2. Amato, G., Carrara, F., Falchi, F., Gennaro, C., Vadicamo, L.: Large-scale instance-level image retrieval. Inf. Process. Manage. 57(6), 102100 (2020)

    Article  Google Scholar 

  3. Carrara, F., Vadicamo, L., Gennaro, C., Amato, G.: Approximate nearest neighbor search on standard search engines. In: Similarity Search and Applications: 15th International Conference, SISAP 2022, Bologna, Italy, October 5–7, 2022, Proceedings, pp. 214–221. Springer (2022)

    Google Scholar 

  4. Chávez, E., Figueroa, K., Navarro, G.: Effective proximity retrieval by ordering permutations. IEEE Trans. Pattern Anal. Mach. Intell. 30(9), 1647–1658 (2008)

    Article  Google Scholar 

  5. Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml

  6. Gennaro, C., Amato, G., Bolettieri, P., Savino, P.: An Approach to Content-Based Image Retrieval Based on the Lucene Search Engine Library. In: Lalmas, M., Jose, J., Rauber, A., Sebastiani, F., Frommholz, I. (eds.) Research and Advanced Technology for Digital Libraries, pp. 55–66. Springer, Berlin, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15464-5_8

    Chapter  Google Scholar 

  7. Higuchi, N., Imamura, Y., Mic, V., Shinohara, T., Hirata, K., Kuboyama, T.: Nearest-neighbor search from large datasets using narrow sketches. In: ICPRAM, pp. 401–410 (2022)

    Google Scholar 

  8. Mic, V., Novak, D., Zezula, P.: Binary sketches for secondary filtering. ACM Trans. Inform. Syst. (TOIS) 37(1), 1–28 (2018)

    Google Scholar 

  9. Novak, D., Zezula, P.: Ppp-codes for large-scale similarity searching. Transactions on Large-Scale Data-and Knowledge-Centered Systems XXIV: Special Issue on Database-and Expert-Systems Applications, pp. 61–87 (2016)

    Google Scholar 

  10. Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

    Google Scholar 

  11. Povey, D., et al.: Semi-orthogonal low-rank matrix factorization for deep neural networks. In: Interspeech, pp. 3743–3747 (2018)

    Google Scholar 

  12. Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill Inc, New York, NY, USA (1986)

    MATH  Google Scholar 

  13. Shang, W., Sohn, K., Almeida, D., Lee, H.: Understanding and improving convolutional neural networks via concatenated rectified linear units. In: Proceedings of the 33rd International Conference on Machine Learning. ICML 2016, vol. 48, pp. 2217–2225. JMLR.org (2016)

    Google Scholar 

  14. Simhadri, H.V., et al.: Results of the neurips’21 challenge on billion-scale approximate nearest neighbor search. In: NeurIPS 2021 Competitions and Demonstrations Track, pp. 177–189. PMLR (2022)

    Google Scholar 

  15. Vadicamo, L., Connor, R., Chávez, E.: Query filtering using two-dimensional local embeddings. Inf. Syst. 101, 101808 (2021)

    Article  Google Scholar 

  16. Vadicamo, L., Gennaro, C., Falchi, F., Chávez, E., Connor, R., Amato, G.: Re-ranking via local embeddings: a use case with permutation-based indexing and the nsimplex projection. Inf. Syst. 95, 101506 (2021)

    Article  Google Scholar 

Download references

Acknowledgements

This work was partially funded by AI4Media - A European Excellence Centre for Media, Society, and Democracy (EC, H2020 n. 951911), SUN - Social and hUman ceNtered XR (EC, Horizon Europe n. 101092612), and National Centre for HPC, Big Data and Quantum Computing - HPC (CUP B93C22000620006).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fabio Carrara .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Carrara, F., Gennaro, C., Vadicamo, L., Amato, G. (2023). Vec2Doc: Transforming Dense Vectors into Sparse Representations for Efficient Information Retrieval. In: Pedreira, O., Estivill-Castro, V. (eds) Similarity Search and Applications. SISAP 2023. Lecture Notes in Computer Science, vol 14289. Springer, Cham. https://doi.org/10.1007/978-3-031-46994-7_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-46994-7_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-46993-0

  • Online ISBN: 978-3-031-46994-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics