Skip to main content
Log in

Adding regular expressions to graph reachability and pattern queries

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

It is increasingly common to find graphs in which edges are of different types, indicating a variety of relationships. For such graphs we propose a class of reachability queries and a class of graph patterns, in which an edge is specified with a regular expression of a certain form, expressing the connectivity of a data graph via edges of various types. In addition, we define graph pattern matching based on a revised notion of graph simulation. On graphs in emerging applications such as social networks, we show that these queries are capable of finding more sensible information than their traditional counterparts. Better still, their increased expressive power does not come with extra complexity. Indeed, (1) we investigate their containment and minimization problems, and show that these fundamental problems are in quadratic time for reachability queries and are in cubic time for pattern queries. (2) We develop an algorithm for answering reachability queries, in quadratic time as for their traditional counterpart. (3) We provide two cubic-time algorithms for evaluating graph pattern queries, as opposed to the NP-completeness of graph pattern matching via subgraph isomorphism. (4) The effectiveness and efficiency of these algorithms are experimentally verified using real-life data and synthetic data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Cohen E, Halperin E, Kaplan H, Zwick U. Reachability and distance queries via 2-hop labels. SIAM Journal on Computing, 2003, 32(5): 1338–1355

    Article  MathSciNet  MATH  Google Scholar 

  2. Jin R, Xiang Y, Ruan N, Fuhry D. 3-hop: a high-compression indexing scheme for reachability query. In: Proceedings of the 35th SIGMOD International Conference on Management of Data, SIGMOD’9. 2009, 813–826

  3. Wang H, He H, Yang J, Yu P S, Yu J X. Dual labeling: answering graph reachability queries in constant time. In: Proceedings of the 22nd International Conference on Data Engineering, ICDE’6. 2006, 75–86

  4. Agrawal R, Borgida A, Jagadish H V. Efficient management of transitive relationships in large data and knowledge bases. In: Proceedings of the 1989 ACM SIGMOD International Conference on Management of Data, SIGMOD’89. 1989, 253–262

  5. Jin R, Xiang Y, Ruan N, Wang H. Efficiently answering reachability queries on very large directed graphs. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD’08. 2008, 595–608

  6. Jin R, Hong H, Wang H, Ruan N, Xiang Y. Computing label-constraint reachability in graph databases. In: Proceedings of the 2010 ACMSIGMOD International Conference on Management of Data, SIGMOD’10. 2010, 123–134

  7. Bruno N, Koudas N, Srivastava D. Holistic twig joins: optimal XML pattern matching. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, SIGMOD’02. 2002, 310–321

  8. Chen L, Gupta A, Kurul M E. Stack-based algorithms for pattern matching on DAGs. In: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB’05. 2005, 493–504

  9. Cheng J, Yu J X, Ding B, Yu P S, Wang H. Fast graph pattern matching. In: Proceedings of the 24th IEEE International Conference on Data Engineering, ICDE’08. 2008, 913–922

  10. Tong H, Faloutsos C, Gallagher B, Eliassi-Rad T. Fast best-effort pattern matching in large attributed graphs. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’07. 2007, 737–746

  11. Zou L, Chen L, Ösu M. Distance-join: pattern match query in a large graph database. In: Proceedings of the VLDB Endowment. 2009, 886–897

  12. Gallagher B. Matching structure and semantics: a survey on graphbased pattern matching. In: Proceedings of AAAI FS’06. 2006, 45–53

  13. McPherson M, Smith-Lovin L, Cook J. Birds of a feather: homophily in social networks. Annual Review of Sociology, 2001, 27: 415–444

    Article  Google Scholar 

  14. Brzozowski MJ, Hogg T, Szabo G. Friends and foes: ideological social networking. In: Proceedings of the 26th Annual SIGCHI Conference on Human Factors in Computing Systems, CHI’08. 2008, 817–820

  15. Henzinger M, Henzinger T, Kopke P. Computing simulations on finite and infinite graphs. In: Proceedings of the 36th Annual Symposium on Foundations of Computer Science, FOCS’95. 1995, 453–462

  16. Neven F, Schwentick T. XPath containment in the presence of disjunction, DTDs, and variables. In: Proceedings of the 9th International Conference on Database Theory, ICDT’03. 2002, 315–329

  17. Wood P T. Containment for XPath fragments under DTD constraints. In: Proceedings of the 9th International Conference on Database Theory, ICDT’03. 2002, 300–314

  18. Papadimitriou C H. Computational complexity. In: Ralston A, Reilly E D, Hemmendinger D, eds. Encyclopedia of Computer Science. Chichester: Wiley, 1994, 260–265

    Google Scholar 

  19. National Consortium for the Study of Terrorism and Responses to Terrorism (START). http://www.start.umd.edu/gtd

  20. Fan W, Li J, Ma S, Tang N, Wu Y. Adding regular expressions to graph reachability and pattern queries. In: Proceedings of the 27th IEEE International Conference on Data Engineering, ICDE’11. 2011, 39–50

  21. Buneman P, Fernandez M, Suciu D. UnQL: a query language and algebra for semistructured data based on structural recursion. The International Journal on Very Large Data Bases, 2000, 9(1): 76–110

    Article  Google Scholar 

  22. Abiteboul S, Quass D, McHugh J, Widom J, Wiener J. The lorel query language for semistructured data. International Journal on Digital Libraries, 1997, 1(1): 68–88

    Article  Google Scholar 

  23. Florescu D, Levy A, Suciu D. Query containment for conjunctive queries with regular expressions. In: Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database SSystems. 1998, 139–148

  24. Barceló P, Hurtado C, Libkin L, Wood P. Expressive languages for path queries over graph-structured data. In: Proceedings of the 29th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems of data. 2010, 3–14

  25. He H, Singh A. Graphs-at-a-time: query language and access methods for graph databases. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 2008, 405–418

  26. Ronen R, Shmueli O. SoQL: a language for querying and creating data in social networks. In: Proceedings of the 25th IEEE International Conference on Data Engineering, ICDE’09. 2009, 1595–1602

  27. SPARQL query language for RDF. http://www.w3.org/TR/rdfsparqlquery/

  28. Mandreoli F, Martoglia R, Villani G, Penzo W. Flexible query answering on graph-modeled data. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT’09. 2009, 216–227

  29. Chan E P, Lim H. Optimization and evaluation of shortest path queries. The VLDB Journal, 2007, 16(3): 343–369

    Article  Google Scholar 

  30. Wei F. TEDI: efficient shortest path query answering on graphs. In: Proceedings of the 2010 International Conference on Management of Data, SIGMOD’10. 2010, 99–110

  31. Shasha D, Wang J, Giugno R. Algorithmics and applications of tree and graph searching. In: Proceedings of the 21st ACM SIGMOD-SIGACTSIGART Symposium on Principles of Database Systems. 2002, 39–52

  32. Bohannon P, Fan W, Flaster M, Narayan P. Information preserving XML schema embedding. In: Proceedings of the 31st International Conference on Very Large Data Bases. 2005, 85–96

  33. Fan W, Li J, Ma S, Wang H, Wu Y. Graph homomorphism revisited for graph matching. Proceedings of the VLDB Endowment, 2010, 3(1–2): 1161–1172

    Google Scholar 

  34. Fan W, Li J, Ma S, Tang N, Wu Y, Wu Y. Graph pattern matching: From intractable to polynomial time. In: Proceedings of the VLDB Endowment. 2010, 264–275

  35. Bustan D, Grumberg O. Simulation-based minimization. ACM Transactions on Computational Logic (TOCL), 2003, 4(2): 181–206

    Article  MathSciNet  Google Scholar 

  36. Abiteboul S, Hull R, Vianu V. Foundations of Databases: The Logical Level. 1st edition. Boston: Addison-Wesley, 1995

    MATH  Google Scholar 

  37. Chen D, Chan C Y. Minimization of tree pattern queries with constraints. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD’08. 2008, 609–622

  38. Milo T, Suciu D. Index structures for path expressions. In: Proceedings of the 7th International Conference on Database Theory, ICDT’99. 1999, 277–295

  39. Kaushik R, Shenoy P, Bohannon P, Gudes E. Exploiting local similarity for indexing paths in graph-structured data. In: Proceedings of the 18th International Conference on Data Engineering, ICDE’02. 2002, 129–140

  40. Yahia S, Benedikt M, Bohannon P. Challenges in searching online communities. IEEE Data Engineering Bulletin, 2007, 30(2): 23–31

    Google Scholar 

  41. Jiang T, Ravikumar B. Minimal nfa problems are hard. SIAM Journal on Computing, 1993, 22(6): 1117–1141

    Article  MathSciNet  MATH  Google Scholar 

  42. Bang-Jensen J, Gutin G Z. Digraphs: Theory, Algorithms and Applications. 2nd edition. Springer, 2008

  43. Chen Z, Shen H T, Zhou X, Yu J X. Monitoring path nearest neighbor in road networks. In: Proceedings of the 35th SIGMOD International Conference on Management of Data, SIGMOD’09. 2009, 591–602

  44. Ranzato F, Tapparo F. A new efficient simulation equivalence algorithm. In: Proceedings of the 22nd Annual IEEE Symposium on the Logic in Computer Science, LICS’07. 2007, 171–180

  45. Tarjan R. Depth-first search and linear graph algorithms. SIAMJournal on Computing, 1972, 1(2): 146–160

    Article  MathSciNet  MATH  Google Scholar 

  46. Ullmann J. An algorithm for subgraph isomorphism. Journal of the ACM (JACM), 1976, 23(1): 31–42

    Article  MathSciNet  Google Scholar 

  47. Wikipedia F-measure. http://en.wikipedia.org/wiki/F-measure

  48. Fan W, Li J, Luo J, Tan Z, Wang X, Wu Y. Incremental graph pattern matching. In: Proceedings of the 2011 International Conference on Management of Data, SIGMOD’11. 2011, 925–936

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuai Ma.

Additional information

Wenfei Fan is professor (Chair) of Web Data Management in the School of Informatics, University of Edinburgh, UK. He is a fellow of the Royal Society of Edinburgh, UK, a national professor of the Thousand-talent Program and a Yangtze River Scholar, China. He received his PhD from the University of Pennsylvania, and his MS and BS from Peking University. He has received the Alberto O. Mendelzon Test-of-time Award of ACM PODS 2010, the best paper award for VLDB 2010, the Roger Needham Award in 2008 (UK), the best paper award for ICDE 2007, the best paper of the Year Award for Computer Networks in 2002, and the Career Award in 2001 (USA). His current research interests include database theory and systems, in particular, data quality, data integration, distributed query processing, query languages, recommender systems, social networks, and web services.

Jianzhong Li is a professor and the chairman of the Department of Computer Science and Engineering at the Harbin Institute of Technology, China. He worked in the University of California at Berkeley as a visiting scholar in 1985. From 1986 to 1987 and from 1992 to 1993, he was a scientist in the Information Research Group in the Department of Computer Science at Lawrence Berkeley National Laboratory, USA. He was also a visiting professor at the University of Minnesota at Minneapolis, Minnesota, USA, from 1991 to 1992 and from 1998 to 1999. His current research interests include database management systems, data warehousing, data mining, and wireless sensor networks. He has authored three books and published more than 200 papers in refereed journals and conference proceedings, such as VLDB Journal, Algorithmic, IEEE Transactions on Knowledge and Data Engineering, IEEE Transactions on Parallel and Distributed Systems, Parallel and Distributed Database, SIGMOD, VLDB, ICDE, INFOCOM, and ICDCS. He has been involved in the program committees of major computer science and technology conferences, including SIGMOD, VLDB, ICDE, INFOCOM, ICDCS, and www. He has also served on the editorial boards for distinguished journals, including Knowledge and Data Engineering, and refereed papers for varied journals and proceedings.

Shuai Ma is a professor in the School of Computer Science and Engineering, Beihang University. He obtained his two PhD degrees from University of Edinburgh in 2010, and from Peking University in 2004. He was a postdoctoral research fellow in the database group, University of Edinburgh, and a consultant at Bell labs, Murray Hill, USA in the summer of 2008. His research interests include database theory and systems, data cleaning, graph matching, social data analysis, and data intensive computing. He has published a number of papers on data quality and graph pattern matching in top database conferences/journals, such as SIGMOD, VLDB, ICDE, www, and the VLDB Journal. He is a recipient of the best paper award for VLDB 2010, and the Visiting Young Faculty Program of MRSA in 2012.

Nan Tang received his PhD from the Chinese University of Hong Kong in 2007. Currently, he is a research scientist at QCRI (Qatar Computing Research Institute), Qatar Foundation, Qatar. He worked as a research staff member at CWI, Netherlands, from 2008 to 2010. He has been a research fellow at University of Edinburgh since 2010. His current research interests include data quality and graph database management.

Yinghui Wu is currently a research scientist of the Department of Computer Science, University of California, Santa Barbara (UCSB). Yinghui received his PhD from the University of Edinburgh, UK in 2010, supervised by Prof.Wenfei Fan. His research interests lie in the areas of database theory and graph database management, with emphasis on graph database models and query languages. He has published papers in SIGMOD, VLDB, ICDE, and ICDT.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fan, W., Li, J., Ma, S. et al. Adding regular expressions to graph reachability and pattern queries. Front. Comput. Sci. 6, 313–338 (2012). https://doi.org/10.1007/s11704-012-1312-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-012-1312-y

Keywords

Navigation