Abstract
It is increasingly common to find graphs in which edges are of different types, indicating a variety of relationships. For such graphs we propose a class of reachability queries and a class of graph patterns, in which an edge is specified with a regular expression of a certain form, expressing the connectivity of a data graph via edges of various types. In addition, we define graph pattern matching based on a revised notion of graph simulation. On graphs in emerging applications such as social networks, we show that these queries are capable of finding more sensible information than their traditional counterparts. Better still, their increased expressive power does not come with extra complexity. Indeed, (1) we investigate their containment and minimization problems, and show that these fundamental problems are in quadratic time for reachability queries and are in cubic time for pattern queries. (2) We develop an algorithm for answering reachability queries, in quadratic time as for their traditional counterpart. (3) We provide two cubic-time algorithms for evaluating graph pattern queries, as opposed to the NP-completeness of graph pattern matching via subgraph isomorphism. (4) The effectiveness and efficiency of these algorithms are experimentally verified using real-life data and synthetic data.
Similar content being viewed by others
References
Cohen E, Halperin E, Kaplan H, Zwick U. Reachability and distance queries via 2-hop labels. SIAM Journal on Computing, 2003, 32(5): 1338–1355
Jin R, Xiang Y, Ruan N, Fuhry D. 3-hop: a high-compression indexing scheme for reachability query. In: Proceedings of the 35th SIGMOD International Conference on Management of Data, SIGMOD’9. 2009, 813–826
Wang H, He H, Yang J, Yu P S, Yu J X. Dual labeling: answering graph reachability queries in constant time. In: Proceedings of the 22nd International Conference on Data Engineering, ICDE’6. 2006, 75–86
Agrawal R, Borgida A, Jagadish H V. Efficient management of transitive relationships in large data and knowledge bases. In: Proceedings of the 1989 ACM SIGMOD International Conference on Management of Data, SIGMOD’89. 1989, 253–262
Jin R, Xiang Y, Ruan N, Wang H. Efficiently answering reachability queries on very large directed graphs. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD’08. 2008, 595–608
Jin R, Hong H, Wang H, Ruan N, Xiang Y. Computing label-constraint reachability in graph databases. In: Proceedings of the 2010 ACMSIGMOD International Conference on Management of Data, SIGMOD’10. 2010, 123–134
Bruno N, Koudas N, Srivastava D. Holistic twig joins: optimal XML pattern matching. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, SIGMOD’02. 2002, 310–321
Chen L, Gupta A, Kurul M E. Stack-based algorithms for pattern matching on DAGs. In: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB’05. 2005, 493–504
Cheng J, Yu J X, Ding B, Yu P S, Wang H. Fast graph pattern matching. In: Proceedings of the 24th IEEE International Conference on Data Engineering, ICDE’08. 2008, 913–922
Tong H, Faloutsos C, Gallagher B, Eliassi-Rad T. Fast best-effort pattern matching in large attributed graphs. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’07. 2007, 737–746
Zou L, Chen L, Ösu M. Distance-join: pattern match query in a large graph database. In: Proceedings of the VLDB Endowment. 2009, 886–897
Gallagher B. Matching structure and semantics: a survey on graphbased pattern matching. In: Proceedings of AAAI FS’06. 2006, 45–53
McPherson M, Smith-Lovin L, Cook J. Birds of a feather: homophily in social networks. Annual Review of Sociology, 2001, 27: 415–444
Brzozowski MJ, Hogg T, Szabo G. Friends and foes: ideological social networking. In: Proceedings of the 26th Annual SIGCHI Conference on Human Factors in Computing Systems, CHI’08. 2008, 817–820
Henzinger M, Henzinger T, Kopke P. Computing simulations on finite and infinite graphs. In: Proceedings of the 36th Annual Symposium on Foundations of Computer Science, FOCS’95. 1995, 453–462
Neven F, Schwentick T. XPath containment in the presence of disjunction, DTDs, and variables. In: Proceedings of the 9th International Conference on Database Theory, ICDT’03. 2002, 315–329
Wood P T. Containment for XPath fragments under DTD constraints. In: Proceedings of the 9th International Conference on Database Theory, ICDT’03. 2002, 300–314
Papadimitriou C H. Computational complexity. In: Ralston A, Reilly E D, Hemmendinger D, eds. Encyclopedia of Computer Science. Chichester: Wiley, 1994, 260–265
National Consortium for the Study of Terrorism and Responses to Terrorism (START). http://www.start.umd.edu/gtd
Fan W, Li J, Ma S, Tang N, Wu Y. Adding regular expressions to graph reachability and pattern queries. In: Proceedings of the 27th IEEE International Conference on Data Engineering, ICDE’11. 2011, 39–50
Buneman P, Fernandez M, Suciu D. UnQL: a query language and algebra for semistructured data based on structural recursion. The International Journal on Very Large Data Bases, 2000, 9(1): 76–110
Abiteboul S, Quass D, McHugh J, Widom J, Wiener J. The lorel query language for semistructured data. International Journal on Digital Libraries, 1997, 1(1): 68–88
Florescu D, Levy A, Suciu D. Query containment for conjunctive queries with regular expressions. In: Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database SSystems. 1998, 139–148
Barceló P, Hurtado C, Libkin L, Wood P. Expressive languages for path queries over graph-structured data. In: Proceedings of the 29th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems of data. 2010, 3–14
He H, Singh A. Graphs-at-a-time: query language and access methods for graph databases. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 2008, 405–418
Ronen R, Shmueli O. SoQL: a language for querying and creating data in social networks. In: Proceedings of the 25th IEEE International Conference on Data Engineering, ICDE’09. 2009, 1595–1602
SPARQL query language for RDF. http://www.w3.org/TR/rdfsparqlquery/
Mandreoli F, Martoglia R, Villani G, Penzo W. Flexible query answering on graph-modeled data. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT’09. 2009, 216–227
Chan E P, Lim H. Optimization and evaluation of shortest path queries. The VLDB Journal, 2007, 16(3): 343–369
Wei F. TEDI: efficient shortest path query answering on graphs. In: Proceedings of the 2010 International Conference on Management of Data, SIGMOD’10. 2010, 99–110
Shasha D, Wang J, Giugno R. Algorithmics and applications of tree and graph searching. In: Proceedings of the 21st ACM SIGMOD-SIGACTSIGART Symposium on Principles of Database Systems. 2002, 39–52
Bohannon P, Fan W, Flaster M, Narayan P. Information preserving XML schema embedding. In: Proceedings of the 31st International Conference on Very Large Data Bases. 2005, 85–96
Fan W, Li J, Ma S, Wang H, Wu Y. Graph homomorphism revisited for graph matching. Proceedings of the VLDB Endowment, 2010, 3(1–2): 1161–1172
Fan W, Li J, Ma S, Tang N, Wu Y, Wu Y. Graph pattern matching: From intractable to polynomial time. In: Proceedings of the VLDB Endowment. 2010, 264–275
Bustan D, Grumberg O. Simulation-based minimization. ACM Transactions on Computational Logic (TOCL), 2003, 4(2): 181–206
Abiteboul S, Hull R, Vianu V. Foundations of Databases: The Logical Level. 1st edition. Boston: Addison-Wesley, 1995
Chen D, Chan C Y. Minimization of tree pattern queries with constraints. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD’08. 2008, 609–622
Milo T, Suciu D. Index structures for path expressions. In: Proceedings of the 7th International Conference on Database Theory, ICDT’99. 1999, 277–295
Kaushik R, Shenoy P, Bohannon P, Gudes E. Exploiting local similarity for indexing paths in graph-structured data. In: Proceedings of the 18th International Conference on Data Engineering, ICDE’02. 2002, 129–140
Yahia S, Benedikt M, Bohannon P. Challenges in searching online communities. IEEE Data Engineering Bulletin, 2007, 30(2): 23–31
Jiang T, Ravikumar B. Minimal nfa problems are hard. SIAM Journal on Computing, 1993, 22(6): 1117–1141
Bang-Jensen J, Gutin G Z. Digraphs: Theory, Algorithms and Applications. 2nd edition. Springer, 2008
Chen Z, Shen H T, Zhou X, Yu J X. Monitoring path nearest neighbor in road networks. In: Proceedings of the 35th SIGMOD International Conference on Management of Data, SIGMOD’09. 2009, 591–602
Ranzato F, Tapparo F. A new efficient simulation equivalence algorithm. In: Proceedings of the 22nd Annual IEEE Symposium on the Logic in Computer Science, LICS’07. 2007, 171–180
Tarjan R. Depth-first search and linear graph algorithms. SIAMJournal on Computing, 1972, 1(2): 146–160
Ullmann J. An algorithm for subgraph isomorphism. Journal of the ACM (JACM), 1976, 23(1): 31–42
Wikipedia F-measure. http://en.wikipedia.org/wiki/F-measure
Fan W, Li J, Luo J, Tan Z, Wang X, Wu Y. Incremental graph pattern matching. In: Proceedings of the 2011 International Conference on Management of Data, SIGMOD’11. 2011, 925–936
Author information
Authors and Affiliations
Corresponding author
Additional information
Wenfei Fan is professor (Chair) of Web Data Management in the School of Informatics, University of Edinburgh, UK. He is a fellow of the Royal Society of Edinburgh, UK, a national professor of the Thousand-talent Program and a Yangtze River Scholar, China. He received his PhD from the University of Pennsylvania, and his MS and BS from Peking University. He has received the Alberto O. Mendelzon Test-of-time Award of ACM PODS 2010, the best paper award for VLDB 2010, the Roger Needham Award in 2008 (UK), the best paper award for ICDE 2007, the best paper of the Year Award for Computer Networks in 2002, and the Career Award in 2001 (USA). His current research interests include database theory and systems, in particular, data quality, data integration, distributed query processing, query languages, recommender systems, social networks, and web services.
Jianzhong Li is a professor and the chairman of the Department of Computer Science and Engineering at the Harbin Institute of Technology, China. He worked in the University of California at Berkeley as a visiting scholar in 1985. From 1986 to 1987 and from 1992 to 1993, he was a scientist in the Information Research Group in the Department of Computer Science at Lawrence Berkeley National Laboratory, USA. He was also a visiting professor at the University of Minnesota at Minneapolis, Minnesota, USA, from 1991 to 1992 and from 1998 to 1999. His current research interests include database management systems, data warehousing, data mining, and wireless sensor networks. He has authored three books and published more than 200 papers in refereed journals and conference proceedings, such as VLDB Journal, Algorithmic, IEEE Transactions on Knowledge and Data Engineering, IEEE Transactions on Parallel and Distributed Systems, Parallel and Distributed Database, SIGMOD, VLDB, ICDE, INFOCOM, and ICDCS. He has been involved in the program committees of major computer science and technology conferences, including SIGMOD, VLDB, ICDE, INFOCOM, ICDCS, and www. He has also served on the editorial boards for distinguished journals, including Knowledge and Data Engineering, and refereed papers for varied journals and proceedings.
Shuai Ma is a professor in the School of Computer Science and Engineering, Beihang University. He obtained his two PhD degrees from University of Edinburgh in 2010, and from Peking University in 2004. He was a postdoctoral research fellow in the database group, University of Edinburgh, and a consultant at Bell labs, Murray Hill, USA in the summer of 2008. His research interests include database theory and systems, data cleaning, graph matching, social data analysis, and data intensive computing. He has published a number of papers on data quality and graph pattern matching in top database conferences/journals, such as SIGMOD, VLDB, ICDE, www, and the VLDB Journal. He is a recipient of the best paper award for VLDB 2010, and the Visiting Young Faculty Program of MRSA in 2012.
Nan Tang received his PhD from the Chinese University of Hong Kong in 2007. Currently, he is a research scientist at QCRI (Qatar Computing Research Institute), Qatar Foundation, Qatar. He worked as a research staff member at CWI, Netherlands, from 2008 to 2010. He has been a research fellow at University of Edinburgh since 2010. His current research interests include data quality and graph database management.
Yinghui Wu is currently a research scientist of the Department of Computer Science, University of California, Santa Barbara (UCSB). Yinghui received his PhD from the University of Edinburgh, UK in 2010, supervised by Prof.Wenfei Fan. His research interests lie in the areas of database theory and graph database management, with emphasis on graph database models and query languages. He has published papers in SIGMOD, VLDB, ICDE, and ICDT.
Rights and permissions
About this article
Cite this article
Fan, W., Li, J., Ma, S. et al. Adding regular expressions to graph reachability and pattern queries. Front. Comput. Sci. 6, 313–338 (2012). https://doi.org/10.1007/s11704-012-1312-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11704-012-1312-y