Skip to main content
Log in

Using a relational database for scalable XML search

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

XML is a flexible and powerful tool that enables information and security sharing in heterogeneous environments. Scalable technologies are needed to effectively manage the growing volumes of XML data. A wide variety of methods exist for storing and searching XML data; the two most common techniques are conventional tree-based and relational approaches. Tree-based approaches represent XML as a tree and use indexes and path join algorithms to process queries. In contrast, the relational approach utilizes the power of a mature relational database to store and search XML. This method relationally maps XML queries to SQL and reconstructs the XML from the database results. To date, the limited acceptance of the relational approach to XML processing is due to the need to redesign the relational schema each time a new XML hierarchy is defined. We, in contrast, describe a relational approach that is fixed schema eliminating the need for schema redesign at the expense of potentially longer runtimes. We show, however, that these potentially longer runtimes are still significantly shorter than those of the tree approach.

We use a popular XML benchmark to compare the scalability of both approaches. We generated large collections of heterogeneous XML documents ranging in size from 500 MB to 8 GB using the XBench benchmark. The scalability of each method was measured by running XML queries that cover a wide range of XML search features on each collection. We measure the scalability of each method over different query features as the collection size increases. In addition, we examine the performance of each method as the result size and the number of predicates increase. Our results show that our relational approach provides a scalable approach to XML retrieval by leveraging existing relational database optimizations. Furthermore, we show that the relational approach typically outperforms the tree-based approach while scaling consistently over all collections studied.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. SQLGenerator. http://ir.iit.edu/projects/SQLGenerator.html. Retrieved on August 2007

  2. DBLP. http://www.informatik.uni-trier.de/~ley/db/. Retrieved on August 2007

  3. Extensible markup language (XML). http://www.w3.org/XML/. Retrieved on August 2007

  4. Open source native XML database. http://exist.sourceforge.net/. Retrieved on August 2007

  5. XBench—a family of benchmarks for XML DBMSs. http://db.uwaterloo.ca/~ddbms/projects/xbench/index.html. Retrieved on August 2007

  6. INitiative for the Evaluation of XML retrieval (INEX) (2007) http://inex.is.informatik.uni-duisburg.de/2007/. Retrieved on August 2007

  7. Afanasiev L, Marx M (2006) An analysis of the current XQuery benchmarks. In: International Workshop on Performance and Evaluation of Data Management Systems (EXPDB)

  8. Amagasa T, Yoshikawa M, Uemura S (2003) QRS: a robust numbering scheme for XML documents. In: Proceedings of the 19th international conference on data engineering (ICDE’03)

  9. Amer-Yahia S, Lalmas M (2006) XML search: languages, INEX and scoring. SIGMOD Rec 35(4):16–23

    Article  Google Scholar 

  10. Beyer KS, Cochrane R, Josifovski V, Kleewein J, Lapis G, Lohman GM, Lyle B, Ozcan F, Pirahesh H, Seemann N, Truong TC, der Linden BV, Vickery B, Zhang C (2005) System RX: one part relational, one part XML. In: SIGMOD conference, pp 347–358

  11. Boncz P, Grust T, van Keulen M, Manegold S, Rittinger J, Teubner J (2006) MonetDB/XQuery: a fast XQuery processor powered by a relational engine. In: SIGMOD international conference on management of data, pp 479–490

  12. Bosak J (2007) Shakespeare XML collection. http://www.ibiblio.org/bosak/xml/eg/. Retrieved on August 2007

  13. Bremer J, Gertz M (2006) Integrating document and data retrieval based on XML. VLDB J 15(1):53–83

    Article  Google Scholar 

  14. Bhme T, Rahm E (2004) Supporting efficient streaming and insertion of XML data in RDBMS. In: 3rd international workshop data integration over the web (DIWeb)

  15. Cathey R, Beitzel S, Jensen E, Grossman D, Frieder O (2007) Relationally mapping XML queries for scalable XML search. In: Proceedings of IEEE conference on the intelligence and security informatics (ISI’07), May 2007

  16. Chien S, Tsotras V, Zaniolo C, Zhang D (2002) Efficient complex query support for multiversion XML documents. In: Proceedings of the EDBT conference

  17. Denoyer L, Gallinari P (2006) The Wikipedia XML corpus. SIGIR Forum 40(1):64–69

    Article  Google Scholar 

  18. Fan W, Yu JX, Lu H, Lu J, Rastogi R (2005) Query translation from XPath to SQL in the presence of recursive DTDs. In: VLDB, pp 337–348

  19. Florescu D, Kossman D (1999) A performance evaluation of alternative mapping schemes for storing XML data in a relational database. Technical report, INRIA, France

  20. Florescu D, Kossman D (1999) Storing and querying XML data using an RDBMS. IEEE Data Eng Bull 22(3):27–34

    Google Scholar 

  21. Gottlob G, Koch C, Pichler R, Segoufin L (2005) The complexity of XPath query evaluation and XML typing. J ACM 52:284–335

    Article  MathSciNet  Google Scholar 

  22. Grohe M (2002) Parameterized complexity for the database theorist. ACM SIGMOD Rec 31:86–96

    Article  Google Scholar 

  23. Grust T (2002) Accelerating XPath location steps. In: ACM SIGMOD international conference on management of data, pp 109–120

  24. Grust T, Sakr S, Teubner J (2004) XQuery on SQL hosts. In: VLDB, pp 252–263

  25. Jiang H, Lu H, Wang W, Yu JX (2002) Path materialization revisited: an efficient storage model for XML data. In: Proceedings of the thirteenth Australian conference on database technologies, vol 5, pp 85–94

  26. Khan L, Rao Y (2001) A performance evaluation of storing XML data in relational database management systems. In: Proceedings of the third international workshop on web information and data management, pp 31–38

  27. Koch C (2006) On the complexity of nonrecursive XQuery and functional query languages on complex values. ACM Trans Database Syst 31:1215–1256

    Article  Google Scholar 

  28. Koch C (2006) Processing queries on tree-structured data efficiently. In: Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 213–224

  29. Krishnamurthy R, Chakaravarthy V, Kaushik R, Naughton J (2004) Recursive XML schemas, recursive XML queries, and relational storage: XML-to-SQL query translation. In: IEEE international conference on data engineering (ICDE)

  30. Krishnamurthy R, Kaushik R, Naughton J (2003) XML-to-SQL query translation literature: the state of the art and open problems, September 2003

  31. Krishnaprasad M, Liu ZH, Manikutty A, Warner JW, Arora V, Kotsovolos S (2004) Query rewrite for XML in Oracle XML DB. In: VLDB, pp 1122–1133

  32. Lee Y, Yoo S, Yoon K, Berra P (1996) Index structures for structured documents. In: Proceedings of the 1st ACM international conference on digital libraries, March 1996

  33. Li Q, Moon B (2001) Indexing and querying XML data for regular path expressions. In: Proceedings of the 27th international conference on very large databases, September 2001

  34. Liu ZH, Krishnaprasad M, Arora V (2005) Native XQuery processing in Oracle XMLDB. In: SIGMOD conference, pp 828–833

  35. Manegold S (2006) An empirical evaluation of XQuery processors. In: International workshop on performance and evaluation of data management systems (EXPDB)

  36. Meier W (2002) eXist: an open source native XML database. In: Web, web-services, and database systems. NODe 2002 web- and database-related workshops, October 2002. Springer LNCS Series, 2593

  37. Meier W (2006) Index-driven XQuery processing in the eXist XML database. In: IXML Prague

  38. Pal S, Cseri I, Schaller G, Seeliger O, Giakoumakis L, Zolotov VV (2004) Indexing XML data stored in a relational database. In: VLDB, pp 1134–1145

  39. Papadimitriou CH, Yannakakis M (1997) On the complexity of database queries. In: Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, pp 12–19

  40. Rys M (2004) XQuery in relational database systems. In: XML 2004

  41. Rys M (2005) XML and relational database management systems: inside microsoft SQL server 2005. In: SIGMOD international conference on management of data, pp 958–962

  42. Al-Khalifa SS, Jagadish H, Koudas N, Patel J, Srivastava D, Wu Y (2002) Structural joins: a primitive for efficient XML query pattern matching. In: Proceedings of the IEEE international conference on data engineering (ICDE), pp 141–152

  43. Schmidt A, Kersten M, Windhouwer M, Waas F (2001) Efficient relational storage and retrieval of XML documents. Lect Notes Comput Sci 1997:137

    Article  Google Scholar 

  44. Shanmugasundaram J, Kiernan J, Shekita EJ, Fan C, Funderburk JE (2001) Querying XML views of relational data. In: VLDB, pp 261–270

  45. Shanmugasundaram J, Tufte K, He G, Zhang C, DeWitt D, Naughton J (1999) Relational database for querying XML documents: limitations and opportunities. In: Proc of VLDB

  46. Snelson J (2005) All XML databses are equal. XTech 2005: XML, the web and beyond

  47. Suciu D (2001) On database theory and XML. ACM SIGMOD Rec Arch 30(3):39–45

    Article  Google Scholar 

  48. Suizo N (2006) XML propels security intelligence. Network world, August 2006

  49. Tatarinov I, Viglas S, Beyer K, Shanmugasundaram J, Shekita E, Zhang C (2002) Storing and querying ordered XML using a relational database system. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data, pp 204–215

  50. Tian F, DeWitt D, Chen J, Zhang C (2002) The design and performance evaluation of alternative XML storage strategies. ACM SIGMOD Rec 31(1):5–10

    Article  Google Scholar 

  51. Vakali A, Catania B, Maddalena A (2005) XML data stores: emerging practices. IEEE Internet Comput 9(2):62–69

    Article  Google Scholar 

  52. Wang H, Meng X (2005) On the sequencing of tree structures for XML indexing. In: ICDE, pp 372–383

  53. Weigel F, Schulz KU, Meuss H (2005) Exploiting native XML indexing techniques for XML retrieval in relational database systems. In: Workshop on web information and data management (WIDM’05), pp 23–30

  54. Weigel F, Schulz KU, Meuss H (2005) The BIRD numbering scheme for XML and tree databases—deciding and reconstructing tree relations using efficient arithmetic operations. In: XSym, pp 49–67

  55. Xing G, Tseng B (2004) Extendible range-based numbering scheme for XML document. In: Proceedings of the international conference on information technology: coding and computing (ITCC’04)

  56. Yoshikawa M, Amagasa T (2001) XRel: a path-based approach to storage and retrieval of XML documents using relational databases. ACM Trans Internet Technol (TOIT) 1(1):110–141

    Article  Google Scholar 

  57. Zhang C, Naughton J, DeWitt D, Luo Q, Lohman G (2001) On supporting containment queries in relational database management systems. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 425–438

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ophir Frieder.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cathey, R.J., Beitzel, S.M., Jensen, E.C. et al. Using a relational database for scalable XML search. J Supercomput 44, 146–178 (2008). https://doi.org/10.1007/s11227-007-0153-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-007-0153-1

Keywords

Navigation