ABSTRACT
Given a directed graph G, a reachability query (u, v) asks whether there exists a path from a node u to a node v in G. The existing studies support reachability queries using indexing techniques, where both the graph and the index are required to reside in main memory. However, they cannot handle reachability queries on massive graphs, when the graph and the index cannot be entirely held in memory because of the high I/O cost. In this paper, we focus on how to minimize the I/O cost when answering reachability queries on massive graphs that cannot reside entirely in memory. First, we propose a new Yes-Label scheme, as a complement of the No-Label used in GRAIL [23], to reduce the number of intermediate results generated. Second, we show how to minimize the number of I/Os using a heap-on-disk data structure when traversing a graph. We also propose new methods to partition the heap-on-disk, in order to ensure that only sequential I/Os are performed. Third, we analyze our approaches and show how to extend our approaches to answer multiple reachability queries effectively. Finally, we conducted extensive performance studies on both large synthetic and large real graphs, and confirm the efficiency of our approaches.
- R. Agrawal, A. Borgida, and H. V. Jagadish. Efficient management of transitive relationships in large data and knowledge bases. In Proc. of SIGMOD'89, 1989. Google ScholarDigital Library
- K. Anyanwu and A. Sheth. ρ-queries: enabling querying for semantic associations on the semantic web. In Proc. of WWW'03, 2003. Google ScholarDigital Library
- R. Bramandia, B. Choi, and W. K. Ng. On incremental maintenance of 2-hop labeling of graphs. In Proc of WWW'08), 2008. Google ScholarDigital Library
- L. Chen, A. Gupta, and M. E. Kurul. Stack-based algorithms for pattern matching on dags. In Proc. of VLDB'05, 2005. Google ScholarDigital Library
- Y. Chen and Y. Chen. An efficient algorithm for answering graph reachability queries. In Proc. of ICDE'08, 2008. Google ScholarDigital Library
- J. Cheng, J. X. Yu, X. Lin, H. Wang, and P. S. Yu. Fast computation of reachability labeling for large graphs. In Proc. of EDBT'06, 2006. Google ScholarDigital Library
- J. Cheng, J. X. Yu, X. Lin, H. Wang, and P. S. Yu. Fast computing reachability labelings for large graphs with high compression rate. In Proc. of EDBT'08, 2008. Google ScholarDigital Library
- E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and distance queries via 2-hop labels. In Proc. of SODA'02, 2002. Google ScholarDigital Library
- T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to algorithms. MIT Press, 2001. Google ScholarDigital Library
- H. He, H. Wang, J. Yang, and P. S. Yu. Compact reachability labeling for graph-structured data. In Proc. of CIKM'05, 2005. Google ScholarDigital Library
- H. V. Jagadish. A compression technique to materialize transitive closure. ACM Trans. Database Syst., 15(4):558--598, 1990. Google ScholarDigital Library
- R. Jin, Y. Xiang, N. Ruan, and D. Fuhry. 3-HOP: A high-compression indexing scheme for reachability query. In Proc. of SIGMOD'09, 2009. Google ScholarDigital Library
- R. Jin, Y. Xiang, N. Ruan, and H. Wang. Efficiently answering reachability queries on very large directed graphs. In Proc. of SIGMOD'08, 2008. Google ScholarDigital Library
- L. Roditty and U. Zwick. A fully dynamic reachability algorithm for directed graphs with an almost linear update time. In Proc. of STOC'04, 2004. Google ScholarDigital Library
- R. Schenkel, A. Theobald, and G. Weikum. Hopi: An efficient connection index for complex XML document collections. In Proc. of EDBT'04, 2004.Google ScholarCross Ref
- R. Schenkel, A. Theobald, and G. Weikum. Efficient creation and incremental maintenance of the HOPI index for complex XML document collections. In Proc. of ICDE'05, 2005. Google ScholarDigital Library
- K. Simon. An improved algorithm for transitive closure on acyclic digraphs. Theor. Comput. Sci., 58(1--3):325--346, 1988. Google ScholarDigital Library
- S. TrißI and U. Leser. Fast and practical indexing and querying of very large graphs. In Proc. of SIGMOD'07, 2007. Google ScholarDigital Library
- J. van Helden, A. Naim, R. Mancuso, M. Eldridge, L. Wernisch, D. Gilbert, and S. Wodak. Reresenting and analysing molecular and cellular function using the computer. Journal of Biological Chemistry, 381(9--10), 2000.Google Scholar
- S. J. van Schaik and O. de Moor. A memory efficient reachability data structure through bit vector compression. In Proc. of SIGMOD'11, 2011. Google ScholarDigital Library
- J. S. Vitter. Algorithms and data structures for external memory. Found. Trends Theor. Comput. Sci., 2:305--474, January 2008. Google ScholarDigital Library
- H. Wang, H. He, J. Yang, P. S. Yu, and J. X. Yu. Dual labeling: Answering graph reachability queries in constant time. In Proc. of ICDE'06, 2006. Google ScholarDigital Library
- H. Yildirim, V. Chaoji, and M. J. Zaki. Grail: Scalable reachability index for large graphs. PVLDB, 3(1), 2010. Google ScholarDigital Library
Recommendations
Equivalence and minimization of conjunctive queries under combined semantics
ICDT '12: Proceedings of the 15th International Conference on Database TheoryThe problems of query containment, equivalence, and minimization are fundamental problems in the context of query processing and optimization. In their classic work [2] published in 1977, Chandra and Merlin solved the three problems for the language of ...
Containment and minimization of positive conjunctive queries in OODB's
PODS '92: Proceedings of the eleventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systemsWith the availability of high-level declarative query languages in an object-oriented database system (OODB), the burden of choosing an efficient execution plan for a query is transferred from the user to the database system. A natural first step is to ...
Fast graph query processing with a low-cost index
This paper studies the problem of processing supergraph queries, that is, given a database containing a set of graphs, find all the graphs in the database of which the query graph is a supergraph. Existing works usually construct an index and performs a ...
Comments