Abstract
This article is a proposal for a database index structure, the XPath accelerator, that has been specifically designed to support the evaluation of XPath path expressions. As such, the index is capable to support all XPath axes (including ancestor, following, preceding-sibling, descendant-or-self, etc.). This feature lets the index stand out among related work on XML indexing structures which had a focus on the child and descendant axes only. The index has been designed with a close eye on the XPath semantics as well as the desire to engineer its internals so that it can be supported well by existing relational database query processing technology: the index (a) permits set-oriented (or, rather, sequence-oriented) path evaluation, and (b) can be implemented and queried using well-established relational index structures, notably B-trees and R-trees.We discuss the implementation of the XPath accelerator on top of different database backends and show that the index performs well on all levels of the memory hierarchy, including disk-based and main-memory based database systems.
Supplemental Material
Available for Download
Article appendix
- Altinel, M. and Franklin, M. J. 2000. Efficient filtering of XML documents for selective dissemination of information. In Proceedings of the 26th International Conference on Very Large Databases (VLDB) (Cairo, Egypt). Morgan-Kaufmann, San Francisco, Calif., 53--64.]] Google ScholarDigital Library
- Berglund, A., Boag, S., Chamberlin, D., Fernandez, M. F., Kay, M., Robie, J., and Siméon, J. 2002. XML Path Language (XPath) 2.0. Tech. Rep. W3C Working Draft, Version 2.0, World Wide Web Consortium. Aug. http://www.w3.org/TR/xpath20/.]]Google Scholar
- Boag, S., Chamberlin, D., Fernandez, M., Florescu, D., Robie, J., and Siméon, J. 2002. XQuery 1.0: An XML Query Language. Tech. Rep. W3C Working Draft, World Wide Web Consortium. Aug. http://www.w3.org/TR/xquery.]]Google Scholar
- Böhm, C., Berchtold, S., Kriegel, H.-P., and Michel, U. 2000. Multidimensional index structures in relational databases. J. Intel. Inf. Syst. (JIIS) 15, 1, 51--70.]] Google ScholarDigital Library
- Boncz, P. A. 2002. Monet: A next-generation DBMS kernel for query-intensive applications. Ph.D. dissertation. University of Amsterdam, The Netherlands.]]Google Scholar
- Boncz, P. A. and Kersten, M. L. 1999. MIL primitives for querying a fragmented world. The VLDB J. 8, 2, 101--119.]] Google ScholarDigital Library
- Chen, Z., Jagadish, H., Korn, F., Koudas, N., Muthukrishnan, S., Ng, R., and Srivastava, D. 2001. Counting twig matches in a tree. In Proceedings of the 17th International Conference on Data Engineering (ICDE) (Heidelberg, Germany). IEEE Computer Society Press, Los Alamitos, Calif., 595--604.]] Google ScholarDigital Library
- Cohen, E., Kaplan, H., and Milo, T. 2002. Labeling dynamic XML trees. In Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS) (Madison, Wisc.). ACM, New York, 271--121.]] Google ScholarDigital Library
- Cooper, B. F., Sample, N., Franklin, M. J., Hjaltason, G. R., and Shadmon, M. 2001. A fast index for semistructured data. In Proceedings of the 27th International Conference on Very Large Data Bases (VLDB) (Rome, Italy). Morgan-Kaufmann, San Francisco, Calif., 341--360.]] Google ScholarDigital Library
- Dietz, P. F. and Sleator, D. D. 1987. Two algorithms for maintaining order in a list. In Conference Record of the 19th Annual ACM Symposium on Theory of Computing (STOC). ACM, New York, 365--372.]] Google ScholarDigital Library
- Fernandez, M., Marsh, J., and Nagy, M. 2002. XQuery 1.0 and XPath 2.0 Data Model. Tech. Rep. W3C Working Draft, World Wide Web Consortium. Aug. http://www.w3.org/TR/ query-datamodel.]]Google Scholar
- Florescu, D. and Kossmann, D. 1999. A performance evaluation of alternative mapping schemes for storing XML data in a relational database. Tech. Rep. 3680. INRIA, Rocquencourt, France. May.]]Google Scholar
- Goldman, R. and Widom, J. 1997. DataGuides: Enabling query formulation and optimization in semistructured databases. In Proceedings of the 23rd International Conference on Very Large Databases (VLDB) (Athens, Greece). Morgan-Kaufmann, San Francisco, Calif., 436--445.]] Google ScholarDigital Library
- Gottlob, G., Koch, C., and Pichler, R. 2002. Efficient algorithms for processing XPath queries. In Proceedings of the 28th International Conference on Very Large Data Bases (VLDB) (Hong Kong, China). Morgan-Kaufmann, San Francisco, Calif., 95--106.]]Google Scholar
- Grust, T. 2002. Accelerating XPath location steps. In Proceedings of the 21st International ACM SIGMOD Conference on Management of Data (Madison, Wisc.). ACM, New York, 109--120.]] Google ScholarDigital Library
- Grust, T. and van Keulen, M. 2003. Tree awareness for relational database kernels: Staircase join. In Intelligent Search on XML, H. Blanken, H.-J. Schek, and G. Weikum, Eds. Lecture Notes in Computer Science, vol. 2818. Springer-Verlag, Heidelberg, Germany.]]Google Scholar
- Guttman, A. 1984. R-trees: A dynamic index structure for spatial searching. In SIGMOD 1984, Proceedings of Annual Meeting (Boston, Mass.). ACM, New York, 47--57.]] Google ScholarDigital Library
- Hellerstein, J. M., Naughton, J. F., and Pfeffer, A. 1995. Generalized search trees for database systems. In Proceedings of the 21st International Conference on Very Large Databases (VLDB) (Zurich, Switzerland). Morgan-Kaufmann, San Francisco, Calif., 562--573.]] Google ScholarDigital Library
- Kamel, I. and Faloutsos, C. 1993. On packing R-trees. In Proceedings of the 2nd International Conference on Information and Knowledge Management (CIKM) (Washington D.C.). ACM, New York, 490--499.]] Google ScholarDigital Library
- Kaushik, R., Bohannon, P., Naughton, J. F., and Korth, H. K. 2002. Covering indexes for branching path Queries. In Proceedings of the 21st International ACM SIGMOD Conference on Management of Data (Madison, Wisc.). ACM, New York, 133--144.]] Google ScholarDigital Library
- Kriegel, H.-P., Pötke, M., and Seidl, T. 2000. Managing intervals efficiently in object-relational databases. In Proceedings of the 26th International Conference on Very Large Databases (VLDB) (Cairo, Egypt). Morgan-Kaufmann, San Francisco, Calif., 407--418.]] Google ScholarDigital Library
- Li, Q. and Moon, B. 2001. Indexing and querying XML data for regular path expressions. In Proceedings of the 27th International Conference on Very Large Data Bases (VLDB) (Rome, Italy). Morgan-Kaufmann, San Francisco, Calif., 361--370.]] Google ScholarDigital Library
- Olteanu, D., Meuss, H., Furche, T., and Bry, F. 2001. Symmetry in XPath. Tech. Rep. PMS-FB-2001-16. Institute of Computer Science, University of Munich, Munich, Germany.]]Google Scholar
- Roussopoulos, N. and Leifker, D. 1985. Direct spatial search on pictorial databases using packed R-trees. In Proceedings of the ACM SIGMOD International Conference on Management of Data (Austin, Tex.). ACM, New York, 17--31.]] Google ScholarDigital Library
- SAX (Simple API for XML). http://sax.sourceforge.net/.]]Google Scholar
- Schmidt, A., Waas, F., Kersten, M., Carey, M. J., Manolescu, I., and Busse, R. 2002. XMark: A Benchmark for XML Data Management. In Proceedings of the 28th International Conference on Very Large Databases (VLDB) (Honk Kong, China). Morgan-Kaufmann, San Francisco, Calif., 974--985.]]Google Scholar
- Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., and Naughton, J. 1999. Relational databases for querying XML documents: Limitations and opportunities. In Proceedings of the 25th International Conference on Very Large Data Bases (VLDB) (Edinburgh, Scotland). Morgan-Kaufmann, San Francisco, Calif., 302--314.]] Google ScholarDigital Library
- Suciu, D. and Milo, T. 1999. Index structures for path expressions. In Proceedings of the 7th International Conference on Database Theory (ICDT) (Jerusalem, Israel). Lecture Notes in Computer Science, vol. 1540. Springer-Verlag, New York, 277--295.]] Google ScholarDigital Library
- Wu, Y., Patel, J. M., and Jagadish, H. 2002. Estimating answer sizes for XML queries. In Proceedings of the 8th International Conference on Extending Database Technology (EDBT) (Prague, Czech Republic). Springer-Verlag, New York, 590--608.]] Google ScholarDigital Library
- Zhang, C., Naughton, J., DeWitt, D., Luo, Q., and Lohman, G. 2001. On supporting containment queries in relational database management systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data (Santa Barbara, Calif.). ACM, New York, 425--436.]] Google ScholarDigital Library
Index Terms
- Accelerating XPath evaluation in any RDBMS
Recommendations
Towards non-directional Xpath evaluation in a RDBMS
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementXML query languages use directional path expressions to locate data in an XML data collection. They are tightly coupled to the structure of a data collection, and can fail when evaluated on the same data in a different structure. This paper extends path ...
Visual Evaluation of XPath Queries
ICCIS '13: Proceedings of the 2013 International Conference on Computational and Information SciencesOver the past one decade, due to its simplicity and flexibility, Extensible Markup Language (XML) is rapidly gaining in popularity as a universal data format for data exchange and integration on the web. In this paper, we present a novel framework to ...
XPath fragments on XML in columns
IIWAS '12: Proceedings of the 14th International Conference on Information Integration and Web-based Applications & ServicesThis paper considers schemaless XML data stored in a column-oriented storage, particularly in C-store. A two-level model of C-store based on XML-enabled relational databases is supposed. The axes of XPath language in this environment have been studied ...
Comments