skip to main content
10.1145/2247596.2247651acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

I/O cost minimization: reachability queries processing over massive graphs

Authors Info & Claims
Published:27 March 2012Publication History

ABSTRACT

Given a directed graph G, a reachability query (u, v) asks whether there exists a path from a node u to a node v in G. The existing studies support reachability queries using indexing techniques, where both the graph and the index are required to reside in main memory. However, they cannot handle reachability queries on massive graphs, when the graph and the index cannot be entirely held in memory because of the high I/O cost. In this paper, we focus on how to minimize the I/O cost when answering reachability queries on massive graphs that cannot reside entirely in memory. First, we propose a new Yes-Label scheme, as a complement of the No-Label used in GRAIL [23], to reduce the number of intermediate results generated. Second, we show how to minimize the number of I/Os using a heap-on-disk data structure when traversing a graph. We also propose new methods to partition the heap-on-disk, in order to ensure that only sequential I/Os are performed. Third, we analyze our approaches and show how to extend our approaches to answer multiple reachability queries effectively. Finally, we conducted extensive performance studies on both large synthetic and large real graphs, and confirm the efficiency of our approaches.

References

  1. R. Agrawal, A. Borgida, and H. V. Jagadish. Efficient management of transitive relationships in large data and knowledge bases. In Proc. of SIGMOD'89, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. Anyanwu and A. Sheth. ρ-queries: enabling querying for semantic associations on the semantic web. In Proc. of WWW'03, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Bramandia, B. Choi, and W. K. Ng. On incremental maintenance of 2-hop labeling of graphs. In Proc of WWW'08), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Chen, A. Gupta, and M. E. Kurul. Stack-based algorithms for pattern matching on dags. In Proc. of VLDB'05, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Y. Chen and Y. Chen. An efficient algorithm for answering graph reachability queries. In Proc. of ICDE'08, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Cheng, J. X. Yu, X. Lin, H. Wang, and P. S. Yu. Fast computation of reachability labeling for large graphs. In Proc. of EDBT'06, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Cheng, J. X. Yu, X. Lin, H. Wang, and P. S. Yu. Fast computing reachability labelings for large graphs with high compression rate. In Proc. of EDBT'08, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and distance queries via 2-hop labels. In Proc. of SODA'02, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to algorithms. MIT Press, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. H. He, H. Wang, J. Yang, and P. S. Yu. Compact reachability labeling for graph-structured data. In Proc. of CIKM'05, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. H. V. Jagadish. A compression technique to materialize transitive closure. ACM Trans. Database Syst., 15(4):558--598, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Jin, Y. Xiang, N. Ruan, and D. Fuhry. 3-HOP: A high-compression indexing scheme for reachability query. In Proc. of SIGMOD'09, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Jin, Y. Xiang, N. Ruan, and H. Wang. Efficiently answering reachability queries on very large directed graphs. In Proc. of SIGMOD'08, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. L. Roditty and U. Zwick. A fully dynamic reachability algorithm for directed graphs with an almost linear update time. In Proc. of STOC'04, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Schenkel, A. Theobald, and G. Weikum. Hopi: An efficient connection index for complex XML document collections. In Proc. of EDBT'04, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  16. R. Schenkel, A. Theobald, and G. Weikum. Efficient creation and incremental maintenance of the HOPI index for complex XML document collections. In Proc. of ICDE'05, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. K. Simon. An improved algorithm for transitive closure on acyclic digraphs. Theor. Comput. Sci., 58(1--3):325--346, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. TrißI and U. Leser. Fast and practical indexing and querying of very large graphs. In Proc. of SIGMOD'07, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. van Helden, A. Naim, R. Mancuso, M. Eldridge, L. Wernisch, D. Gilbert, and S. Wodak. Reresenting and analysing molecular and cellular function using the computer. Journal of Biological Chemistry, 381(9--10), 2000.Google ScholarGoogle Scholar
  20. S. J. van Schaik and O. de Moor. A memory efficient reachability data structure through bit vector compression. In Proc. of SIGMOD'11, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. S. Vitter. Algorithms and data structures for external memory. Found. Trends Theor. Comput. Sci., 2:305--474, January 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. H. Wang, H. He, J. Yang, P. S. Yu, and J. X. Yu. Dual labeling: Answering graph reachability queries in constant time. In Proc. of ICDE'06, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. H. Yildirim, V. Chaoji, and M. J. Zaki. Grail: Scalable reachability index for large graphs. PVLDB, 3(1), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    EDBT '12: Proceedings of the 15th International Conference on Extending Database Technology
    March 2012
    643 pages
    ISBN:9781450307901
    DOI:10.1145/2247596

    Copyright © 2012 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 27 March 2012

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate7of10submissions,70%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader