Abstract
Numerous techniques have been proposed in the past for supporting efficient k-nearest neighbor (k-NN) queries in continuous data spaces. Limited work has been reported in the literature for k-NN queries in a nonordered discrete data space (NDDS). Performing k-NN queries in an NDDS raises new challenges. The Hamming distance is usually used to measure the distance between two vectors (objects) in an NDDS. Due to the coarse granularity of the Hamming distance, a k-NN query in an NDDS may lead to a high degree of nondeterminism for the query result. We propose a new distance measure, called Granularity-Enhanced Hamming (GEH) distance, which effectively reduces the number of candidate solutions for a query. We have also implemented k-NN queries using multidimensional database indexing in NDDSs. Further, we use the properties of our multidimensional NDDS index to derive the probability of encountering valid neighbors within specific regions of the index. This probability is used to develop a new search ordering heuristic. Our experiments on synthetic and genomic data sets demonstrate that our index-based k-NN algorithm is efficient in finding k-NNs in both uniform and nonuniform data sets in NDDSs and that our heuristics are effective in improving the performance of such queries.
- Badel, A., Mornon, J., and Hazout, S. 1992. Searching for geometric molecular shape complementarity using bidimensional surface profiles. J. Molec. Graph. 10, 4, 205--211. Google ScholarDigital Library
- Beckmann, N., Kriegel, H.-P., Schneider, R., and Seeger, B. 1990. The R*-tree: An efficient and robust access method for points and rectangles. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 322--331. Google ScholarDigital Library
- Berchtold, S., Keim, D. A., and Kriegel, H.-P. 1996. The X-tree: An index structure for high-dimensional data. In Proceedings of the 22nd International Conference on Very Large Databases. 28--39. Google ScholarDigital Library
- Bookstein, A., Kulyukin, V. A., and Raita, T. 2002. Generalized Hamming distance. Inform. Retriev. 5, 4, 353--375. Google ScholarDigital Library
- Chávez, E., Navarro, B., Baeza-Yates, R., and Marroquín, J. L. 2001. Searching in metric spaces. ACM Comput. Surv. 33, 3, 273--321. Google ScholarDigital Library
- Ciaccia, P., Patella, M., and Zezula, P. 1997. M-tree: An efficient access method for similarity search in metric spaces. In Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB). 426--435. Google ScholarDigital Library
- Guttman, A. 1988. R-trees: A Dynamic Index Structure for Spatial Searching. Morgan Kaufmann Publishers Inc., San Francisco, CA.Google Scholar
- Hamming, R. 1950. Error-detecting and error-correcting codes. Bell Syst. Tech. J. 29, 2, 147--160.Google ScholarCross Ref
- Henrich, A. 1998. The LSDh-tree: An access structure for feature vectors. In Proceedings of the 14th International Conference on Data Engineering (ICDE). 362--369. Google ScholarDigital Library
- Hjaltason, G. and Samet, H. 2000. Incremental similarity search in multimedia databases. Tech. Rep. TR-4199, Computer-Science Department, University of Maryland.Google Scholar
- Kent, W. J. 2002. Blat--the blast-like alignment tool. Genome Res. 12, 4, 656--664.Google ScholarCross Ref
- Kolahdouzan, M. and Shahabi, C. 2004. Voronoi-based k nearest neighbor search for spatial network databases. In Proceedings of the 13th international Conference on Very Large Data Bases (VLDB). 840--851. Google ScholarDigital Library
- Kolbe, D., Zhu, Q., and Pramanik, S. 2007. On k-nearest neighbor searching in nonordered discrete data spaces. In Proceedings of the 23rd International Conference on Data Engineering (ICDE). 426--435.Google Scholar
- Korn, F., Sidiropoulos, N., Faloutsos, C., Siegel, E., and Protopapas, Z. 1996. Fast nearest neighbor search in medical image databases. In Proceedings of the 22th International Conference on Very Large Data Bases (VLDB). 215--226. Google ScholarDigital Library
- Lewis, F., Hughes, G. J., Rambaut, A., Pozniak, A., and Brown, A. J. L. 2008. Episodic sexual transmission of HIV revealed by molecular phylodynamics. PLoS Medicine 5, 3.Google ScholarCross Ref
- Qian, G. 2004. Principles and applications for supporting similarity queries in non-ordered-discrete and continuous data spaces. Ph.D. thesis, Michigan State University, East Lansing, MI. Google ScholarDigital Library
- Qian, G., Zhu, Q., Xue, Q., and Pramanik, S. 2003. The ND-tree: A dynamic indexing technique for multidimensional non-ordered discrete data spaces. In Proceedings of the 29th International Conference on Very Large Data Bases (VLDB). 620--631. Google ScholarDigital Library
- Qian, G., Zhu, Q., Xue, Q., and Pramanik, S. 2006a. Dynamic indexing for multidimensional non-ordered discrete data spaces using a data-partitioning approach. ACM Trans. Datab. Syst. 31, 2, 439--484. Google ScholarDigital Library
- Qian, G., Zhu, Q., Xue, Q., and Pramanik, S. 2006b. A space-partitioning-based indexing method for multidimensional non-ordered discrete data spaces. ACM Trans. Inform. Syst. 24, 1, 79--110. Google ScholarDigital Library
- Robinson, J. T. 1981. The k-d-b-tree: A search structure for large multidimensional dynamic indexes. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 10--18. Google ScholarDigital Library
- Roussopoulos, N., Kelley, S., and Vincent, F. 1995. Nearest neighbor queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 71--79. Google ScholarDigital Library
- Seidl, T. and Kriegel, H.-P. 1997. Efficient user-adaptable similarity search in large multimedia databases. In Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB). 506--515. Google ScholarDigital Library
- Seidl, T. and Kriegel, H.-P. 1998. Optimal multi-step k-nearest neighbor search. SIGMOD Rec. 27, 2, 154--165. Google ScholarDigital Library
Index Terms
- Efficient k-nearest neighbor searching in nonordered discrete data spaces
Recommendations
k-Nearest neighbor searching in hybrid spaces
AbstractLittle work has been reported in the literature to support k-nearest neighbor (k-NN) searches/queries in hybrid data spaces (HDS). An HDS is composed of a combination of continuous and non-ordered discrete dimensions. This combination ...
Highlights- Developed algorithm for searching multi-dimensional hybrid data spaces.
Ranked Reverse Nearest Neighbor Search
Given a set of data points P and a query point q in a multidimensional space, Reverse Nearest Neighbor (RNN) query finds data points in P whose nearest neighbors are q. Reverse k-Nearest Neighbor (RkNN) query (where k ≥ 1) generalizes RNN query to find ...
Range-based Obstructed Nearest Neighbor Queries
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataIn this paper, we study a novel variant of obstructed nearest neighbor queries, namely, range-based obstructed nearest neighbor (RONN) search. A natural generalization of continuous obstructed nearest-neighbor (CONN), an RONN query retrieves the ...
Comments