Abstract
Approximate nearest neighbor (ANN) search is a fundamental search in multi-dimensional databases, which has numerous real-world applications, such as image retrieval, recommendation, entity resolution, and sequence matching. Proximity graph (PG) has been the state-of-the-art index for ANN search. However, the search on existing PGs either suffers from a high time complexity or has no performance guarantee on the search result. In this paper, we propose a novel τ-monotonic graph (τ- MG) to address the limitations. The novelty of τ-MG lies in a τ-monotonic property. Based on this property, we prove that if the distance between a query q and its nearest neighbor is less than a constant τ, the search on τ-MG guarantees to find the exact nearest neighbor of q and the time complexity of the search is smaller than all existing PG-based methods. For index construction efficiency, we propose an approximate variant of τ-MG, namely τ-monotonic neighborhood graph (τ- MNG), which only requires the neighborhood of each node to be τ-monotonic. We further propose an optimization to reduce the number of distance computations in search. Our extensive experiments show that our techniques outperform all existing methods on well-known real-world datasets.
Supplemental Material
- Laurent Amsaleg, Oussama Chelly, Teddy Furon, Stéphane Girard, Michael E. Houle, Ken-ichi Kawarabayashi, and Michael Nett. 2015. Estimating Local Intrinsic Dimensionality. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). 29--38.Google ScholarDigital Library
- Laurent Amsaleg, Björn Þór Jónsson, and Herwig Lejsek. 2018. Scalability of the NV-tree: Three Experiments. In Proceedings of the International Conference on Similarity Search and Applications (SISAP), Vol. 11223. 59--72.Google ScholarDigital Library
- Anon. 2010. Datasets for approximate nearest neighbor search. Retrieved May 2022 from http://corpus-texmex.irisa.fr/.Google Scholar
- Anon. 2011. Million Song Dataset Benchmarks. Retrieved May 2020 from http://www.ifs.tuwien.ac.at/mir/msd/.Google Scholar
- Anon. unknown. Common Crawl. Retrieved April 2020 from http://commoncrawl.org/.Google Scholar
- Martin Aumüller, Erik Bernhardsson, and Alexander Faithfull. 2020. ANN-Benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. Information Systems, Vol. 87 (2020), 101374.Google ScholarDigital Library
- Franz Aurenhammer. 1991. Voronoi Diagrams - A Survey of a Fundamental Geometric Data Structure. Comput. Surveys, Vol. 23, 3 (1991), 345--405.Google ScholarDigital Library
- Dmitry Baranchuk, Artem Babenko, and Yury Malkov. 2018. Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors. In Proceedings of the European Conference on Computer Vision (ECCV), Vol. 11216. 209--224.Google ScholarDigital Library
- Dmitry Baranchuk, Dmitry Persiyanov, Anton Sinitsin, and Artem Babenko. 2019. Learning to route in similarity graphs. In Proceedings of the International Conference on Machine Learning (ICML). 475--484.Google Scholar
- Konstantin Berlin, Sergey Koren, Chen-Shan Chin, James P. Drake, Jane M. Landolin, and Adam M. Phillippy. 2015. Assembling Large Genomes with Single-Molecule Sequencing and Locality-Sensitive Hashing. Nature Biotechnology, Vol. 33, 6 (2015), 623--630.Google ScholarCross Ref
- Kevin S. Beyer, Jonathan Goldstein, Raghu Ramakrishnan, and Uri Shaft. 1999. When Is ”Nearest Neighbor” Meaningful?. In Proceedings of the International Conference on Database Theory (ICDT). 217--235.Google ScholarCross Ref
- Abhinandan Das, Mayur Datar, Ashutosh Garg, and Shyamsundar Rajaram. 2007. Google news personalization: scalable online collaborative filtering. In Proceedings of the International Conference on World Wide Web (WWW). 271--280.Google ScholarDigital Library
- D.W. Dearholt, N. Gonzales, and G. Kurup. 1988. Monotonic Search Networks For Computer Vision Databases. In Proceedings of the Asilomar Conference on Signals, Systems and Computers, Vol. 2. 548--553.Google Scholar
- Wei Dong, Moses Charikar, and Kai Li. 2011. Efficient k-nearest neighbor graph construction for generic similarity measures. In Proceedings of the International Conference on World Wide Web (WWW). 577--586.Google ScholarDigital Library
- Cong Fu and Deng Cai. 2016. EFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph. CoRR, Vol. abs/1609.07228 (2016).Google Scholar
- Cong Fu, Changxu Wang, and Deng Cai. 2021. High Dimensional Similarity Search with Satellite System Graph: Efficiency, Scalability, and Unindexed Query Compatibility. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021), 1--1.Google Scholar
- Cong Fu, Chao Xiang, Changxu Wang, and Deng Cai. 2019. Fast Approximate Nearest Neighbor Search with the Navigating Spreading-out Graph. Proceedings of the VLDB Endowment, Vol. 12, 5 (2019), 461--474.Google ScholarDigital Library
- Gylfi Þó r Gudmundsson, Bjö rn Þó r Jó nsson, Laurent Amsaleg, and Michael J. Franklin. 2018. Prototyping a Web-Scale Multimedia Retrieval Service Using Spark. ACM Transactions on Multimedia Computing, Communications, and Applications, Vol. 14, 3s (2018), 65:1--65:24.Google ScholarDigital Library
- Kiana Hajebi, Yasin Abbasi-Yadkori, Hossein Shahbazi, and Hong Zhang. 2011. Fast Approximate Nearest-Neighbor Search with k-Nearest Neighbor Graph. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 1312--1317.Google Scholar
- Ben Harwood and Tom Drummond. 2016. FANNG: Fast Approximate Nearest Neighbour Graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5713--5722.Google ScholarCross Ref
- Johannes Hoffart, Stephan Seufert, Dat Ba Nguyen, Martin Theobald, and Gerhard Weikum. 2012. KORE: keyphrase overlap relatedness for entity disambiguation. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM). 545--554.Google ScholarDigital Library
- Omid Jafari, Preeti Maurya, Parth Nagarkar, Khandker Mushfiqul Islam, and Chidambaram Crushev. 2021. A Survey on Locality Sensitive Hashing Algorithms and their Applications. CoRR, Vol. abs/2102.08942 (2021).Google Scholar
- J.W. Jaromczyk and G.T. Toussaint. 1992. Relative neighborhood graphs and their relatives. Proc. IEEE, Vol. 80, 9 (1992), 1502--1517.Google ScholarCross Ref
- Pennington Jeffrey, Socher Richard, and D. Manning Christopher. 2015. GloVe: Global Vectors for Word Representation. Retrieved May 2022 from http://nlp.stanford.edu/projects/glove/.Google Scholar
- Hervé Jé gou, Matthijs Douze, and Cordelia Schmid. 2011. Product Quantization for Nearest Neighbor Search. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 33, 1 (2011), 117--128.Google ScholarDigital Library
- Zhongming Jin, Debing Zhang, Yao Hu, Shiding Lin, Deng Cai, and Xiaofei He. 2014. Fast and Accurate Hashing Via Iterative Nearest Neighbors Expansion. IEEE Transactions on Cybernetics, Vol. 44, 11 (2014), 2167--2177.Google ScholarCross Ref
- Jon Kleinberg. 2000. The Small-World Phenomenon: An Algorithmic Perspective. In Proceedings of the Annual ACM Symposium on Theory of Computing (STOC). 163--170.Google ScholarDigital Library
- Brian Kulis and Kristen Grauman. 2009. Kernelized locality-sensitive hashing for scalable image search. In Proceedings of the International Conference on Computer Vision (ICCV). 2130--2137.Google ScholarCross Ref
- Govinda D. Kurup. 1992. Database Organized on the Basis of Similarities with Applications in Computer Vision. Ph.,D. Dissertation.Google Scholar
- Herwig Lejsek, Friðrik Heiðar Á smundsson, Bjö rn Þó r Jó nsson, and Laurent Amsaleg. 2009. NV-Tree: An Efficient Disk-Based Index for Approximate Search in Very Large High-Dimensional Collections. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 31, 5 (2009), 869--883.Google ScholarDigital Library
- Conglong Li, Minjia Zhang, David G. Andersen, and Yuxiong He. 2020. Improving Approximate Nearest Neighbor Search through Learned Adaptive Early Termination. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). 2539--2554.Google ScholarDigital Library
- W. Li, Y. Zhang, Y. Sun, W. Wang, M. Li, W. Zhang, and X. Lin. 2020. Approximate Nearest Neighbor Search on High Dimensional Data - Experiments, Analyses, and Improvement. IEEE Transactions on Knowledge and Data Engineering, Vol. 32, 8 (2020), 1475--1488.Google ScholarCross Ref
- Ting Liu, Andrew Moore, Ke Yang, and Alexander Gray. 2004. An Investigation of Practical Approximate Nearest Neighbor Algorithms. In Advances in Neural Information Processing Systems, Vol. 17.Google Scholar
- Wanqi Liu, Hanchen Wang, Ying Zhang, Wei Wang, Lu Qin, and Xuemin Lin. 2021. EI-LSH: An early-termination driven I/O efficient incremental c-approximate nearest neighbor search. VLDB Journal, Vol. 30, 2 (2021), 215--235.Google ScholarDigital Library
- Yury Malkov, Alexander Ponomarenko, Andrey Logvinov, and Vladimir Krylov. 2014. Approximate nearest neighbor algorithm based on navigable small world graphs. Information Systems, Vol. 45 (2014), 61--68.Google ScholarCross Ref
- Y. A. Malkov and D. A. Yashunin. 2020. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 42, 4 (2020), 824--836.Google ScholarDigital Library
- Charles U. Martel and Van Nguyen. 2004. Analyzing Kleinberg's (and other) small-world Models. In Proceedings of the Annual ACM Symposium on Principles of Distributed Computing (PODC). ACM, 179--188.Google Scholar
- Yusuke Matsui, Yusuke Uchida, Hervé Jégou, and Shin'ichi Satoh. 2018. A Survey of Product Quantization. ITE Transactions on Media Technology and Applications, Vol. 6, 1 (2018), 2--10.Google ScholarCross Ref
- Stanley Milgram. 1967. The small world problem. Psychology Today, Vol. 1, 1 (1967), 61--67.Google Scholar
- Stanislav Morozov and Artem Babenko. 2018. Non-metric Similarity Graphs for Maximum Inner Product Search. In Advances in Neural Information Processing Systems, Vol. 31.Google Scholar
- Toshiro Ogita, Hidetomo Ichihashi, Akira Notsu, and Katsuhiro Honda. 2014. Improvement of PCA-Based Approximate Nearest Neighbor Search Using Distance Statistics. Journal of Advanced Computational Intelligence and Intelligent Informatics, Vol. 18, 4 (2014), 658--664.Google ScholarCross Ref
- Marco Patella and Paolo Ciaccia. 2008. The Many Facets of Approximate Similarity Search. In Proceedings of the First International Workshop on Similarity Search and Applications (SISAP). 10--21.Google ScholarDigital Library
- Yun Peng, Byron Choi, Tsz Nam Chan, and Jianliang Xu. 2022. LAN: Learning-based Approximate k-Nearest Neighbor Search in Graph Databases. In Proceedings of the International Conference on Data Engineering (ICDE). 2508--2521.Google ScholarCross Ref
- James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. 2007. Object retrieval with large vocabularies and fast spatial matching. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
- Liudmila Prokhorenkova and Aleksandr Shekhovtsov. 2020a. Graph-based Nearest Neighbor Search: From Practice to Theory. In Proceedings of the International Conference on Machine Learning (ICML). 7803--7813.Google Scholar
- Liudmila Prokhorenkova and Aleksandr Shekhovtsov. 2020b. Graph-based Nearest Neighbor Search: From Practice to Theory. In Proceedings of the International Conference on Machine Learning (ICML)), Vol. 119. 7803--7813.Google Scholar
- Uri Shaft and Raghu Ramakrishnan. 2005. When Is Nearest Neighbors Indexable?. In Proceedings of the International Conference on Database Theory (ICDT). 158--172.Google Scholar
- Larissa C. Shimomura, Rafael Seidi Oyamada, Marcos R. Vieira, and Daniel S. Kaster. 2021. A survey on graph-based methods for similarity searches in metric spaces. Information Systems, Vol. 95 (2021), 101507.Google ScholarCross Ref
- Yifang Sun, Wei Wang, Jianbin Qin, Ying Zhang, and Xuemin Lin. 2014. SRS: Solving c-Approximate Nearest Neighbor Queries in High Dimensional Euclidean Space with a Tiny Index. Proceedings of the VLDB Endowment, Vol. 8, 1 (2014), 1--12.Google ScholarDigital Library
- Javier Vargas Muñoz, Marcos A. Gonçalves, Zanoni Dias, and Ricardo da S. Torres. 2019. Hierarchical Clustering-Based Graphs for Large Scale Approximate Nearest Neighbor Search. Pattern Recognition (PR), Vol. 96 (2019), 106970.Google ScholarDigital Library
- Hongya Wang, Zhizheng Wang, Wei Wang, Yingyuan Xiao, Zeng Zhao, and Kaixiang Yang. 2020. A Note on Graph-Based Nearest Neighbor Search. arxiv: 2012.11083 [cs.LG]Google Scholar
- Jingdong Wang, Naiyan Wang, You Jia, Jian Li, Gang Zeng, Hongbin Zha, and Xian-Sheng Hua. 2014. Trinary-Projection Trees for Approximate Nearest Neighbor Search. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 36, 2 (2014), 388--403.Google ScholarDigital Library
- Mengzhao Wang, Xiaoliang Xu, Qiang Yue, and Yuxiang Wang. 2021. A Comprehensive Survey and Experimental Comparison of Graph-Based Approximate Nearest Neighbor Search. Proceedings of the VLDB Endowment, Vol. 14, 11 (2021), 1964--1978.Google ScholarDigital Library
- Duncan J. Watts and Steven H. Strogatz. 1998. Collective dynamics of 'small-world' networks. Nature, Vol. 339 (1998), 440--442.Google ScholarCross Ref
- Xiang Wu, Ruiqi Guo, Ananda Theertha Suresh, Sanjiv Kumar, Daniel N. Holtmann-Rice, David Simcha, and Felix X. Yu. 2017. Multiscale Quantization for Fast Similarity Search. In Advances in Neural Information Processing Systems. 5745--5755.Google Scholar
- Zhaozhuo Xu, Weijie Zhao, Shulong Tan, Zhixin Zhou, and Ping Li. 2022. Proximity Graph Maintenance for Fast Online Nearest Neighbor Search. arXiv (2022).Google Scholar
- Artem Babenko Yandex and Victor Lempitsky. 2016. Efficient Indexing of Billion-Scale Datasets of Deep Descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2055--2063.Google ScholarCross Ref
- Yuanhang Yu, Dong Wen?, Ying Zhang, Lu Qin, Wenjie Zhang, and Xuemin Lin. 2022. GPU-accelerated Proximity Graph Approximate Nearest Neighbor Search and Construction. In Proceedings of the IEEE International Conference on Data Engineering (ICDE). 552--564.Google ScholarCross Ref
- Weijie Zhao, Shulong Tan, and Ping Li. 2020. SONG: Approximate Nearest Neighbor Search on GPU. In Proceedings of the IEEE International Conference on Data Engineering (ICDE). 1033--1044.Google ScholarCross Ref
- Yuxin Zheng, Qi Guo, Anthony K.H. Tung, and Sai Wu. 2016. LazyLSH: Approximate Nearest Neighbor Search for Multiple Distance Functions with a Single Index. In Proceedings of the International Conference on Management of Data (SIGMOD). 2023--2037.Google ScholarDigital Library
Index Terms
- Efficient Approximate Nearest Neighbor Search in Multi-dimensional Databases
Recommendations
Complementary hashing for approximate nearest neighbor search
ICCV '11: Proceedings of the 2011 International Conference on Computer VisionRecently, hashing based Approximate Nearest Neighbor (ANN) techniques have been attracting lots of attention in computer vision. The data-dependent hashing methods, e.g., Spectral Hashing, expects better performance than the data-blind counterparts, e.g.,...
Efficient approximate nearest neighbor search with integrated binary codes
MM '11: Proceedings of the 19th ACM international conference on MultimediaNearest neighbor search in Euclidean space is a fundamental problem in multimedia retrieval. The difficulty of exact nearest neighbor search has led to approximate solutions that sacrifice precision for efficiency. Among such solutions, approaches that ...
Order preserving hashing for approximate nearest neighbor search
MM '13: Proceedings of the 21st ACM international conference on MultimediaIn this paper, we propose a novel method to learn similarity-preserving hash functions for approximate nearest neighbor (NN) search. The key idea is to learn hash functions by maximizing the alignment between the similarity orders computed from the ...
Comments