skip to main content
research-article

Efficient Approximate Nearest Neighbor Search in Multi-dimensional Databases

Published:30 May 2023Publication History
Skip Abstract Section

Abstract

Approximate nearest neighbor (ANN) search is a fundamental search in multi-dimensional databases, which has numerous real-world applications, such as image retrieval, recommendation, entity resolution, and sequence matching. Proximity graph (PG) has been the state-of-the-art index for ANN search. However, the search on existing PGs either suffers from a high time complexity or has no performance guarantee on the search result. In this paper, we propose a novel τ-monotonic graph (τ- MG) to address the limitations. The novelty of τ-MG lies in a τ-monotonic property. Based on this property, we prove that if the distance between a query q and its nearest neighbor is less than a constant τ, the search on τ-MG guarantees to find the exact nearest neighbor of q and the time complexity of the search is smaller than all existing PG-based methods. For index construction efficiency, we propose an approximate variant of τ-MG, namely τ-monotonic neighborhood graph (τ- MNG), which only requires the neighborhood of each node to be τ-monotonic. We further propose an optimization to reduce the number of distance computations in search. Our extensive experiments show that our techniques outperform all existing methods on well-known real-world datasets.

Skip Supplemental Material Section

Supplemental Material

PACMMOD-V1mod054.mp4

mp4

106.8 MB

References

  1. Laurent Amsaleg, Oussama Chelly, Teddy Furon, Stéphane Girard, Michael E. Houle, Ken-ichi Kawarabayashi, and Michael Nett. 2015. Estimating Local Intrinsic Dimensionality. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). 29--38.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Laurent Amsaleg, Björn Þór Jónsson, and Herwig Lejsek. 2018. Scalability of the NV-tree: Three Experiments. In Proceedings of the International Conference on Similarity Search and Applications (SISAP), Vol. 11223. 59--72.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Anon. 2010. Datasets for approximate nearest neighbor search. Retrieved May 2022 from http://corpus-texmex.irisa.fr/.Google ScholarGoogle Scholar
  4. Anon. 2011. Million Song Dataset Benchmarks. Retrieved May 2020 from http://www.ifs.tuwien.ac.at/mir/msd/.Google ScholarGoogle Scholar
  5. Anon. unknown. Common Crawl. Retrieved April 2020 from http://commoncrawl.org/.Google ScholarGoogle Scholar
  6. Martin Aumüller, Erik Bernhardsson, and Alexander Faithfull. 2020. ANN-Benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. Information Systems, Vol. 87 (2020), 101374.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Franz Aurenhammer. 1991. Voronoi Diagrams - A Survey of a Fundamental Geometric Data Structure. Comput. Surveys, Vol. 23, 3 (1991), 345--405.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Dmitry Baranchuk, Artem Babenko, and Yury Malkov. 2018. Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors. In Proceedings of the European Conference on Computer Vision (ECCV), Vol. 11216. 209--224.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Dmitry Baranchuk, Dmitry Persiyanov, Anton Sinitsin, and Artem Babenko. 2019. Learning to route in similarity graphs. In Proceedings of the International Conference on Machine Learning (ICML). 475--484.Google ScholarGoogle Scholar
  10. Konstantin Berlin, Sergey Koren, Chen-Shan Chin, James P. Drake, Jane M. Landolin, and Adam M. Phillippy. 2015. Assembling Large Genomes with Single-Molecule Sequencing and Locality-Sensitive Hashing. Nature Biotechnology, Vol. 33, 6 (2015), 623--630.Google ScholarGoogle ScholarCross RefCross Ref
  11. Kevin S. Beyer, Jonathan Goldstein, Raghu Ramakrishnan, and Uri Shaft. 1999. When Is ”Nearest Neighbor” Meaningful?. In Proceedings of the International Conference on Database Theory (ICDT). 217--235.Google ScholarGoogle ScholarCross RefCross Ref
  12. Abhinandan Das, Mayur Datar, Ashutosh Garg, and Shyamsundar Rajaram. 2007. Google news personalization: scalable online collaborative filtering. In Proceedings of the International Conference on World Wide Web (WWW). 271--280.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D.W. Dearholt, N. Gonzales, and G. Kurup. 1988. Monotonic Search Networks For Computer Vision Databases. In Proceedings of the Asilomar Conference on Signals, Systems and Computers, Vol. 2. 548--553.Google ScholarGoogle Scholar
  14. Wei Dong, Moses Charikar, and Kai Li. 2011. Efficient k-nearest neighbor graph construction for generic similarity measures. In Proceedings of the International Conference on World Wide Web (WWW). 577--586.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Cong Fu and Deng Cai. 2016. EFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph. CoRR, Vol. abs/1609.07228 (2016).Google ScholarGoogle Scholar
  16. Cong Fu, Changxu Wang, and Deng Cai. 2021. High Dimensional Similarity Search with Satellite System Graph: Efficiency, Scalability, and Unindexed Query Compatibility. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021), 1--1.Google ScholarGoogle Scholar
  17. Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. 2019. Fast Approximate Nearest Neighbor Search with the Navigating Spreading-out Graph. Proceedings of the VLDB Endowment, Vol. 12, 5 (2019), 461--474.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Gylfi Þó r Gudmundsson, Bjö rn Þó r Jó nsson, Laurent Amsaleg, and Michael J. Franklin. 2018. Prototyping a Web-Scale Multimedia Retrieval Service Using Spark. ACM Transactions on Multimedia Computing, Communications, and Applications, Vol. 14, 3s (2018), 65:1--65:24.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kiana Hajebi, Yasin Abbasi-Yadkori, Hossein Shahbazi, and Hong Zhang. 2011. Fast Approximate Nearest-Neighbor Search with k-Nearest Neighbor Graph. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 1312--1317.Google ScholarGoogle Scholar
  20. Ben Harwood and Tom Drummond. 2016. FANNG: Fast Approximate Nearest Neighbour Graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5713--5722.Google ScholarGoogle ScholarCross RefCross Ref
  21. Johannes Hoffart, Stephan Seufert, Dat Ba Nguyen, Martin Theobald, and Gerhard Weikum. 2012. KORE: keyphrase overlap relatedness for entity disambiguation. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM). 545--554.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Omid Jafari, Preeti Maurya, Parth Nagarkar, Khandker Mushfiqul Islam, and Chidambaram Crushev. 2021. A Survey on Locality Sensitive Hashing Algorithms and their Applications. CoRR, Vol. abs/2102.08942 (2021).Google ScholarGoogle Scholar
  23. J.W. Jaromczyk and G.T. Toussaint. 1992. Relative neighborhood graphs and their relatives. Proc. IEEE, Vol. 80, 9 (1992), 1502--1517.Google ScholarGoogle ScholarCross RefCross Ref
  24. Pennington Jeffrey, Socher Richard, and D. Manning Christopher. 2015. GloVe: Global Vectors for Word Representation. Retrieved May 2022 from http://nlp.stanford.edu/projects/glove/.Google ScholarGoogle Scholar
  25. Hervé Jé gou, Matthijs Douze, and Cordelia Schmid. 2011. Product Quantization for Nearest Neighbor Search. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 33, 1 (2011), 117--128.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Zhongming Jin, Debing Zhang, Yao Hu, Shiding Lin, Deng Cai, and Xiaofei He. 2014. Fast and Accurate Hashing Via Iterative Nearest Neighbors Expansion. IEEE Transactions on Cybernetics, Vol. 44, 11 (2014), 2167--2177.Google ScholarGoogle ScholarCross RefCross Ref
  27. Jon Kleinberg. 2000. The Small-World Phenomenon: An Algorithmic Perspective. In Proceedings of the Annual ACM Symposium on Theory of Computing (STOC). 163--170.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Brian Kulis and Kristen Grauman. 2009. Kernelized locality-sensitive hashing for scalable image search. In Proceedings of the International Conference on Computer Vision (ICCV). 2130--2137.Google ScholarGoogle ScholarCross RefCross Ref
  29. Govinda D. Kurup. 1992. Database Organized on the Basis of Similarities with Applications in Computer Vision. Ph.,D. Dissertation.Google ScholarGoogle Scholar
  30. Herwig Lejsek, Friðrik Heiðar Á smundsson, Bjö rn Þó r Jó nsson, and Laurent Amsaleg. 2009. NV-Tree: An Efficient Disk-Based Index for Approximate Search in Very Large High-Dimensional Collections. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 31, 5 (2009), 869--883.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Conglong Li, Minjia Zhang, David G. Andersen, and Yuxiong He. 2020. Improving Approximate Nearest Neighbor Search through Learned Adaptive Early Termination. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). 2539--2554.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. W. Li, Y. Zhang, Y. Sun, W. Wang, M. Li, W. Zhang, and X. Lin. 2020. Approximate Nearest Neighbor Search on High Dimensional Data - Experiments, Analyses, and Improvement. IEEE Transactions on Knowledge and Data Engineering, Vol. 32, 8 (2020), 1475--1488.Google ScholarGoogle ScholarCross RefCross Ref
  33. Ting Liu, Andrew Moore, Ke Yang, and Alexander Gray. 2004. An Investigation of Practical Approximate Nearest Neighbor Algorithms. In Advances in Neural Information Processing Systems, Vol. 17.Google ScholarGoogle Scholar
  34. Wanqi Liu, Hanchen Wang, Ying Zhang, Wei Wang, Lu Qin, and Xuemin Lin. 2021. EI-LSH: An early-termination driven I/O efficient incremental c-approximate nearest neighbor search. VLDB Journal, Vol. 30, 2 (2021), 215--235.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Yury Malkov, Alexander Ponomarenko, Andrey Logvinov, and Vladimir Krylov. 2014. Approximate nearest neighbor algorithm based on navigable small world graphs. Information Systems, Vol. 45 (2014), 61--68.Google ScholarGoogle ScholarCross RefCross Ref
  36. Y. A. Malkov and D. A. Yashunin. 2020. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 42, 4 (2020), 824--836.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Charles U. Martel and Van Nguyen. 2004. Analyzing Kleinberg's (and other) small-world Models. In Proceedings of the Annual ACM Symposium on Principles of Distributed Computing (PODC). ACM, 179--188.Google ScholarGoogle Scholar
  38. Yusuke Matsui, Yusuke Uchida, Hervé Jégou, and Shin'ichi Satoh. 2018. A Survey of Product Quantization. ITE Transactions on Media Technology and Applications, Vol. 6, 1 (2018), 2--10.Google ScholarGoogle ScholarCross RefCross Ref
  39. Stanley Milgram. 1967. The small world problem. Psychology Today, Vol. 1, 1 (1967), 61--67.Google ScholarGoogle Scholar
  40. Stanislav Morozov and Artem Babenko. 2018. Non-metric Similarity Graphs for Maximum Inner Product Search. In Advances in Neural Information Processing Systems, Vol. 31.Google ScholarGoogle Scholar
  41. Toshiro Ogita, Hidetomo Ichihashi, Akira Notsu, and Katsuhiro Honda. 2014. Improvement of PCA-Based Approximate Nearest Neighbor Search Using Distance Statistics. Journal of Advanced Computational Intelligence and Intelligent Informatics, Vol. 18, 4 (2014), 658--664.Google ScholarGoogle ScholarCross RefCross Ref
  42. Marco Patella and Paolo Ciaccia. 2008. The Many Facets of Approximate Similarity Search. In Proceedings of the First International Workshop on Similarity Search and Applications (SISAP). 10--21.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Yun Peng, Byron Choi, Tsz Nam Chan, and Jianliang Xu. 2022. LAN: Learning-based Approximate k-Nearest Neighbor Search in Graph Databases. In Proceedings of the International Conference on Data Engineering (ICDE). 2508--2521.Google ScholarGoogle ScholarCross RefCross Ref
  44. James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. 2007. Object retrieval with large vocabularies and fast spatial matching. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  45. Liudmila Prokhorenkova and Aleksandr Shekhovtsov. 2020a. Graph-based Nearest Neighbor Search: From Practice to Theory. In Proceedings of the International Conference on Machine Learning (ICML). 7803--7813.Google ScholarGoogle Scholar
  46. Liudmila Prokhorenkova and Aleksandr Shekhovtsov. 2020b. Graph-based Nearest Neighbor Search: From Practice to Theory. In Proceedings of the International Conference on Machine Learning (ICML)), Vol. 119. 7803--7813.Google ScholarGoogle Scholar
  47. Uri Shaft and Raghu Ramakrishnan. 2005. When Is Nearest Neighbors Indexable?. In Proceedings of the International Conference on Database Theory (ICDT). 158--172.Google ScholarGoogle Scholar
  48. Larissa C. Shimomura, Rafael Seidi Oyamada, Marcos R. Vieira, and Daniel S. Kaster. 2021. A survey on graph-based methods for similarity searches in metric spaces. Information Systems, Vol. 95 (2021), 101507.Google ScholarGoogle ScholarCross RefCross Ref
  49. Yifang Sun, Wei Wang, Jianbin Qin, Ying Zhang, and Xuemin Lin. 2014. SRS: Solving c-Approximate Nearest Neighbor Queries in High Dimensional Euclidean Space with a Tiny Index. Proceedings of the VLDB Endowment, Vol. 8, 1 (2014), 1--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Javier Vargas Muñoz, Marcos A. Gonçalves, Zanoni Dias, and Ricardo da S. Torres. 2019. Hierarchical Clustering-Based Graphs for Large Scale Approximate Nearest Neighbor Search. Pattern Recognition (PR), Vol. 96 (2019), 106970.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Hongya Wang, Zhizheng Wang, Wei Wang, Yingyuan Xiao, Zeng Zhao, and Kaixiang Yang. 2020. A Note on Graph-Based Nearest Neighbor Search. arxiv: 2012.11083 [cs.LG]Google ScholarGoogle Scholar
  52. Jingdong Wang, Naiyan Wang, You Jia, Jian Li, Gang Zeng, Hongbin Zha, and Xian-Sheng Hua. 2014. Trinary-Projection Trees for Approximate Nearest Neighbor Search. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 36, 2 (2014), 388--403.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Mengzhao Wang, Xiaoliang Xu, Qiang Yue, and Yuxiang Wang. 2021. A Comprehensive Survey and Experimental Comparison of Graph-Based Approximate Nearest Neighbor Search. Proceedings of the VLDB Endowment, Vol. 14, 11 (2021), 1964--1978.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Duncan J. Watts and Steven H. Strogatz. 1998. Collective dynamics of 'small-world' networks. Nature, Vol. 339 (1998), 440--442.Google ScholarGoogle ScholarCross RefCross Ref
  55. Xiang Wu, Ruiqi Guo, Ananda Theertha Suresh, Sanjiv Kumar, Daniel N. Holtmann-Rice, David Simcha, and Felix X. Yu. 2017. Multiscale Quantization for Fast Similarity Search. In Advances in Neural Information Processing Systems. 5745--5755.Google ScholarGoogle Scholar
  56. Zhaozhuo Xu, Weijie Zhao, Shulong Tan, Zhixin Zhou, and Ping Li. 2022. Proximity Graph Maintenance for Fast Online Nearest Neighbor Search. arXiv (2022).Google ScholarGoogle Scholar
  57. Artem Babenko Yandex and Victor Lempitsky. 2016. Efficient Indexing of Billion-Scale Datasets of Deep Descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2055--2063.Google ScholarGoogle ScholarCross RefCross Ref
  58. Yuanhang Yu, Dong Wen?, Ying Zhang, Lu Qin, Wenjie Zhang, and Xuemin Lin. 2022. GPU-accelerated Proximity Graph Approximate Nearest Neighbor Search and Construction. In Proceedings of the IEEE International Conference on Data Engineering (ICDE). 552--564.Google ScholarGoogle ScholarCross RefCross Ref
  59. Weijie Zhao, Shulong Tan, and Ping Li. 2020. SONG: Approximate Nearest Neighbor Search on GPU. In Proceedings of the IEEE International Conference on Data Engineering (ICDE). 1033--1044.Google ScholarGoogle ScholarCross RefCross Ref
  60. Yuxin Zheng, Qi Guo, Anthony K.H. Tung, and Sai Wu. 2016. LazyLSH: Approximate Nearest Neighbor Search for Multiple Distance Functions with a Single Index. In Proceedings of the International Conference on Management of Data (SIGMOD). 2023--2037.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient Approximate Nearest Neighbor Search in Multi-dimensional Databases

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the ACM on Management of Data
        Proceedings of the ACM on Management of Data  Volume 1, Issue 1
        PACMMOD
        May 2023
        2807 pages
        EISSN:2836-6573
        DOI:10.1145/3603164
        Issue’s Table of Contents

        Copyright © 2023 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 30 May 2023
        Published in pacmmod Volume 1, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Qualifiers

        • research-article
      • Article Metrics

        • Downloads (Last 12 months)515
        • Downloads (Last 6 weeks)76

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader