Skip to main content
Log in

Providing built-in keyword search capabilities in RDBMS

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

A common approach to performing keyword search over relational databases is to find the minimum Steiner trees in database graphs transformed from relational data. These methods, however, are rather expensive as the minimum Steiner tree problem is known to be NP-hard. Further, these methods are independent of the underlying relational database management system (RDBMS), thus cannot benefit from the capabilities of the RDBMS. As an alternative, in this paper we propose a new concept called Compact Steiner Tree (CSTree), which can be used to approximate the Steiner tree problem for answering top-k keyword queries efficiently. We propose a novel structure-aware index, together with an effective ranking mechanism for fast, progressive and accurate retrieval of top-k highest ranked CSTrees. The proposed techniques can be implemented using a standard relational RDBMS to benefit from its indexing and query-processing capability. We have implemented our techniques in MYSQL, which can provide built-in keyword-search capabilities using SQL. The experimental results show a significant improvement in both search efficiency and result quality comparing to existing state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal, S., Chaudhuri, S., Das, G.: Dbxplorer: A system for keyword-based search over relational databases. In: ICDE, pp. 5–16 (2002)

  2. Amer-Yahia S., Hiemstra D., Roelleke T., Srivastava D., Weikum G.: Db&ir integration: report on the dagstuhl seminar ranked xml querying. SIGMOD Rec. 37(3), 46–49 (2008)

    Article  Google Scholar 

  3. Arai B., Das G., Gunopulos D., Koudas N.: Anytime measures for top-algorithms on exact and fuzzy data sets. VLDB J. 18(2), 407–427 (2009)

    Article  Google Scholar 

  4. Aurenhammer F.: Voronoi diagrams—a survey of a fundamental geometric data structure. ACM Comput. Surv. 23(3), 345–405 (1991)

    Article  Google Scholar 

  5. Balmin, A., Hristidis, V., Papakonstantinou, Y.: Objectrank: authority-based keyword search in databases. In: VLDB, pp. 564–575 (2004)

  6. Bao, Z., Ling, T. W., Chen, B., Lu, J.: Effective xml keyword search with relevance oriented ranking. In: ICDE, pp. 517–528 (2009)

  7. Bast, H., Weber, I.: The completesearch engine: interactive, efficient, and towards ir& db integration. In: CIDR, pp. 88–95 (2007)

  8. Bhalotia, G., Hulgeri, A., Nakhe, C., Chakrabarti, S., Sudarshan, S.: Keyword searching and browsing in databases using banks. In: ICDE, pp. 431–440 (2002)

  9. Brin S., Page L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. 30(1–7), 107–117 (1998)

    Google Scholar 

  10. Chakrabarti, S.: Dynamic personalized pagerank in entity-relation graphs. In: WWW, pp. 571–580 (2007)

  11. Chen, Y., Wang, W., Liu, Z., Lin, X.: Keyword search on structured and semi-structured data. In: SIGMOD Conference, pp. 1005–1010 (2009)

  12. Chu, E., Baid, A., Chai, X., Doan, A., Naughton, J.F.: Combining keyword search and forms for ad hoc querying of databases. In: SIGMOD Conference, pp. 349–360 (2009)

  13. Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: Xsearch: a semantic search engine for xml. In: VLDB, pp. 45–56 (2003)

  14. Dalvi, B.B., Kshirsagar, M., Sudarshan, S.: Keyword search on external memory data graphs. In: VLDB, pp. 1189–1204 (2008)

  15. Ding, B., Yu, J.X., Wang, S., Qin, L., Zhang, X., Lin, X.: Finding top-k min-cost connected trees in databases. In: ICDE, pp. 836–845 (2007)

  16. Fagin, R.: Fuzzy queries in multimedia database systems. In: PODS, pp. 1–10 (1998)

  17. Felipe, I.D., Hristidis, V., Rishe, N.: Keyword search on spatial databases. In: ICDE, pp. 656–665 (2008)

  18. Feng, J., Li, G., Wang, J., Zhou, L.: Finding and ranking compact connected trees for effective keyword proximity search in xml documents. Inform. Syst. (2009)

  19. Fredman M.L., Tarjan R.E.: Fibonacci heaps and their uses in improved network optimization algorithms. J. ACM 34(3), 596–615 (1987)

    Article  MathSciNet  Google Scholar 

  20. Garey M.R., Johnson D.S.: The rectilinear steiner tree problem in np complete. SIAM J. Appl. Math. 32, 826–834 (1977)

    Article  MATH  MathSciNet  Google Scholar 

  21. Garg N., Konjevod G., Ravi R.: A polylogarithmic approximation algorithm for the group steiner tree problem. J. Algorithms 37(1), 66–84 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  22. Golenberg, K., Kimelfeld, B., Sagiv, Y.: Keyword proximity search in complex data graphs. In: SIGMOD Conference, pp. 927–940 (2008)

  23. Guo, L., Shanmugasundaram, J., Yona, G.: Topology search over biological databases. In: ICDE, pp. 556–565 (2007)

  24. Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: Xrank: Ranked keyword search over xml documents. In: SIGMOD Conference, pp. 16–27 (2003)

  25. He, H., Wang, H., Yang, J., Yu, P.S.: Blinks: ranked keyword searches on graphs. In: SIGMOD Conference, pp. 305–316 (2007)

  26. Hristidis, V., Gravano, L., Papakonstantinou, Y.: Efficient ir-style keyword search over relational databases. In: VLDB, pp. 850–861 (2003)

  27. Hristidis V., Koudas N., Papakonstantinou Y., Srivastava D.: Keyword proximity search in xml trees. IEEE TKDE 18(4), 525–539 (2006)

    Google Scholar 

  28. Hristidis, V., Papakonstantinou, Y.: Discover: keyword search in relational databases. In: VLDB, pp. 670–681 (2002)

  29. Hristidis, V., Papakonstantinou, Y., Balmin, A.: Keyword proximity search on xml graphs. In: ICDE, pp. 367–378 (2003)

  30. Hua M., Pei J., Fu A. W.-C., Lin X., Leung H.-F.: Top-k typicality queries and efficient query answering methods on large databases. VLDB J. 18(3), 809–835 (2009)

    Article  Google Scholar 

  31. Ilyas I.F., Aref W.G., Elmagarmid A.K.: Supporting top-k join queries in relational databases. VLDB J. 13(3), 207–221 (2004)

    Article  Google Scholar 

  32. Ji, S., Li, G., Li, C., Feng, J.: Efficient interactive fuzzy keyword search. In: WWW, pp. 371–380 (2009)

  33. Kacholia, V., Pandit, S., Chakrabarti, S., Sudarshan, S., Desai, R., Karambelkar, H.: Bidirectional expansion for keyword search on graph databases. In: VLDB, pp. 505–516 (2005)

  34. Kimelfeld, B., Sagiv, Y.: Finding and approximating top-k answers in keyword proximity search. In: PODS, pp. 173–182 (2006)

  35. Kleinberg J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  36. Kong, L., Gilleron, R., Lemay, A.: Retrieving meaningful relaxed tightest fragments for xml keyword search. In: EDBT, pp. 815–826 (2009)

  37. Koutrika, G., Zadeh, Z.M., Garcia-Molina, H.: Data clouds: summarizing keyword search results over structured data. In: EDBT, pp. 391–402 (2009)

  38. Lempel R., Moran S.: Salsa: the stochastic approach for link-structure analysis. ACM Trans. Inf. Syst. 19(2), 131–160 (2001)

    Article  Google Scholar 

  39. Li, G., Feng, J., Wang, J., Song, X., Zhou, L.: Sailer: an effective search engine for unified retrieval of heterogeneous xml and web documents. In: WWW, pp. 1061–1062 (2008)

  40. Li, G., Feng, J., Wang, J., Yu, B., He, Y.: Race: finding and ranking compact connected trees for keyword proximity search over xml documents. In: WWW, pp. 1045–1046 (2008)

  41. Li, G., Feng, J., Wang, J., Zhou, L.: Effective keyword search for valuable lcas over xml documents. In: CIKM, pp. 31–40 (2007)

  42. Li, G., Ji, S., Li, C., Feng, J.: Efficient type-ahead search on relational data: a tastier approach. In: SIGMOD Conference, pp. 695–706 (2009)

  43. Li, G., Li, C., Feng, J., Zhou, L.: Sail: Structure-aware indexing for effective and progressive top-k keyword search over xml documents. Inform. Sci. (2009)

  44. Li, G., Ooi, B. C., Feng, J., Wang, J., Zhou, L.: Ease: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data. In: SIGMOD Conference, pp. 903–914 (2008)

  45. Li, G., Zhou, X., Feng, J., Wang, J.: Progressive keyword search in relational databases. In: ICDE (2009)

  46. Li, Y., Yu, C., Jagadish, H.V.: Schema-free xquery. In: VLDB, pp. 72–83 (2004)

  47. Liu, F., Yu, C. T., Meng, W., Chowdhury, A.: Effective keyword search in relational databases. In: SIGMOD Conference, pp. 563–574 (2006)

  48. Liu, Z., Chen, Y.: Identifying meaningful return information for xml keyword search. In: SIGMOD Conference, pp. 329–340 (2007)

  49. Liu Z., Chen Y.: Reasoning and identifying relevant matches for xml keyword search. PVLDB 1 1, 921–932 (2008)

    Google Scholar 

  50. Luo, Y., Lin, X., Wang, W., Zhou, X.: Spark: top-k keyword query in relational databases. In: SIGMOD Conference, pp. 115–126 (2007)

  51. Markowetz, A., Yang, Y., Papadias, D.: Keyword search on relational data streams. In: SIGMOD Conference, pp. 605–616 (2007)

  52. Qin, L., Yu, J. X., Chang, L.: Keyword search in databases: the power of rdbms. In: SIGMOD Conference, pp. 681–694 (2009)

  53. Richardson, M., Domingos,P.: The intelligent surfer: probabilistic combination of link and content information in pagerank. In: NIPS, pp. 1441–1448 (2001)

  54. Robins, G., Zelikovsky, A.: Improved steiner tree approximation in graphs. In: SODA, pp. 770–779, (2000)

  55. Sayyadian, M., LeKhac, H., Doan, A., Gravano, L.: Efficient keyword search across heterogeneous relational databases. In: ICDE, pp. 346–355, (2007)

  56. Shao F., Guo L., Botev C., Bhaskar A., Chettiar M., Yang F., Shanmugasundaram J.: Efficient keyword search over virtual xml views. VLDB J. 18(2), 543–570 (2009)

    Article  Google Scholar 

  57. Shao, F., Guo, L., Botev, C., Bhaskar, A., Chettiar, M., Yang, F., Shanmugasundaram, J.: Efficient keyword search over virtual xml views. In: VLDB, pp. 1057–1068 (2007)

  58. Simitsis A., Koutrika G., Ioannidis Y.E.: Précis: from unstructured keywords as queries to structured databases as answers. VLDB J. 17(1), 117–149 (2008)

    Google Scholar 

  59. Sun, C., Chan, C.Y., Goenka, A.K.: Multiway slca-based keyword search in xml data. In: WWW, pp. 1043–1052 (2007)

  60. Tao, Y., Yu, J.X.: Finding frequent co-occurring terms in relational keyword search. In: EDBT, pp. 839–850 (2009)

  61. Theobald M., Bast H., Majumdar D., Schenkel R., Weikum G.: Topx: efficient and versatile top-k query processing for semistructured data. VLDB J. 17(1), 81–115 (2008)

    Google Scholar 

  62. Tran, T., Wang, H., Rudolph, S., Cimiano, P.: Top-k exploration of query candidates for efficient keyword search on graph-shaped (rdf) data. In: ICDE, pp. 405–416 (2009)

  63. Vu, Q.H., Ooi, B.C., Papadias, D., Tung, A.K.H.: A graph method for keyword-based selection of the top-k databases. In: SIGMOD Conference, pp. 915–926 (2008)

  64. Weikum, G.: Db&ir: both sides now. In: SIGMOD Conference, pp. 25–30 (2007)

  65. Xu, Y., Papakonstantinou, Y.: Efficient keyword search for smallest lcas in xml databases. In: SIGMOD Conference, pp. 537–538 (2005)

  66. Xu, Y., Papakonstantinou, Y.: Efficient LCA based keyword search in XML data. In: EDBT, pp. 535–546 (2008)

  67. Yu, B., Li, G., Sollins, K.R., Tung, A.K.H.: Effective keyword-based selection of relational databases. In: SIGMOD Conference, pp. 139–150 (2007)

  68. Zhang, D., Chee, Y. M., Mondal, A., Tung, A. K. H., Kitsuregawa, M.: Keyword search in spatial databases: Towards searching by document. In: ICDE, pp. 688–699 (2009)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guoliang Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, G., Feng, J., Zhou, X. et al. Providing built-in keyword search capabilities in RDBMS. The VLDB Journal 20, 1–19 (2011). https://doi.org/10.1007/s00778-010-0188-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-010-0188-4

Keywords

Navigation