Skip to main content
Log in

Growing triples on trees: an XML-RDF hybrid model for annotated documents

The VLDB Journal Aims and scope Submit manuscript

Abstract

Since the beginning of the Semantic Web initiative, significant efforts have been invested in finding efficient ways to publish, store, and query metadata on the Web. RDF and SPARQL have become the standard data model and query language, respectively, to describe resources on the Web. Large amounts of RDF data are now available either as stand-alone datasets or as metadata over semi-structured (typically XML) documents. The ability to apply RDF annotations over XML data emphasizes the need to represent and query data and metadata simultaneously. We propose XR, a novel hybrid data model capturing the structural aspects of XML data and the semantics of RDF, also enabling us to reason about XML data. Our model is general enough to describe pure XML or RDF datasets, as well as RDF-annotated XML data, where any XML node can act as a resource. This data model comes with the XRQ query language that combines features of both XQuery and SPARQL. To demonstrate the feasibility of this hybrid XML-RDF data management setting, and to validate its interest, we have developed an XR platform on top of well-known data management systems for XML and RDF. In particular, the platform features several XRQ query processing algorithms, whose performance is experimentally compared.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. http://linkeddata.org

  2. http://openannotation.org

  3. http://wikileak.org

  4. http://twitter.com

  5. http://maps.google.com

  6. http://guardian.co.uk/data

  7. http://www.factcheck.org

  8. http://www.politifact.org

  9. http://storyful.com

  10. http://on.ted.com/MarkhamNolan

  11. Here and subsequently in this paper, we make the convention that strings starting with : are URIs. Formally, URIs consist of two parts: a namespace and a local name [4], separated by the : symbol. A URI without a specified namespace is of the form :LocalName and is interpreted to refer to a default namespace.

  12. The W3C’s xml:id recommendation [17] makes node identity explicit as an xml:id attribute; however, this has not been widely adopted. We explore the xml:id idea as one option in our implementation (see Sect. 5).

  13. As can be seen in the example, in practice PushJoins also extends the projection list of \(Q_R\) to include the bindings for the variables of \(Q_X\) that exist in \(Q\)’s head but do not exist in \(Q_R\) (e.g., the binding for variable \(\$CA\) in this example). However, to keep the presentation simple, this detail is omitted from the algorithm’s pseudocode.

  14. One could further speed up ViP2P by (\(i\)) indexing its views on the XURI attributes that are passed as bindings from the RDF query and/or (\(ii\)) pushing value joins among \(Q_X\) tree patterns within the materialized views, etc. We did not pursue these alternatives, as they are rather orthogonal to the main purpose of this paper.

  15. This interaction between XURI encoding and RDF-3X performance can be reasonably seen as an “implementation accident”; we only explain it for completeness.

References

  1. Extensible Markup Language (XML) 1.0 (fifth edition). http://www.w3.org/TR/xml/ (2008)

  2. RDF. http://www.w3.org/RDF/ (2004)

  3. RDF Vocabulary Description Language 1.0: RDF Schema. http://www.w3.org/TR/rdf-schema/ (2004)

  4. URIs, URLs, and URNs: Clarifications and Recommendations 1.0. http://www.w3.org/TR/2001/NOTE-uri-clarification-20010921/ (2001)

  5. DBpedia 3.7. http://wiki.dbpedia.org/Downloads37

  6. Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a large ontology from Wikipedia and WordNet. J. Web Sem, 6(3), (2008)

  7. Bischof, S., Decker, S., Krennwallner, T., Lopes, N., Polleres, A.: Mapping between RDF and XML with XSPARQL. Technical report, DERI (2011)

  8. RDF concepts and abstract syntax. http://www.w3.org/TR/rdf-concepts/ (2004)

  9. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Reading (1995)

    MATH  Google Scholar 

  10. RDF Semantics. http://www.w3.org/TR/rdf-mt/ (2004)

  11. OWL 2 web ontology language document overview. http://www.w3.org/TR/owl2-overview/

  12. Amer-Yahia, S., Cho, S., Lakshmanan, L.V., Srivastava, D.: Minimization of tree pattern queries. In: SIGMOD (2001)

  13. SPARQL query language for RDF. http://www.w3.org/TR/rdf-sparql-query/ (2008)

  14. Arion, A., Benzaken, V., Manolescu, I.: XML access modules: towards physical data independence in XML databases. In: XIME-P (2005)

  15. Balmin, A., Özcan, F., Beyer, K.S., Cochrane, R., Pirahesh, H.: A framework for using materialized XPath views in XML query processing. In: VLDB (2004)

  16. XQuery 1.0 and XPath 2.0 data model. http://www.w3.org/xpath-datamodel/ (2010)

  17. xml:id. http://www.w3.org/TR/xml-id (2005)

  18. Tatarinov, I., Viglas, S.D., Beyer, K., Shanmugasundaram, J., Shekita, E., Zhang, C.: Storing and querying ordered XML using a relational database system. In: SIGMOD, pp. 204–215, ACM, New York (2002)

  19. Rys, M.: XML and relational database management systems: inside Microsoft SQL Server. In: SIGMOD, pp. 958–962, ACM, New York (2005)

  20. Chen, L., Bernstein, P., Carlin, P., Filipovic, D., Rys, M., Shamgunov, N., Terwilliger, J., Todic, M., Tomasevic, S., Tomic, D.: Mapping XML to a wide sparse table. In: ICDE, pp. 630–641 (April 2012)

  21. Haas, L.M., Freytag, J. C., Lohman, G. M., Pirahesh H.: Extensible query processing in starburst. In: SIGMOD (1989)

  22. Afanasiev, L., Marx, M.: An analysis of XQuery benchmarks. Inf. Syst. 33(2), 155–181 (2008)

    Article  Google Scholar 

  23. SPARQL 1.1 Query Language. http://www.w3.org/TR/sparql11-query/ (2012)

  24. Xu, L., Ling, T.W., Wu, H., Bao, Z.: DDE: from Dewey to a fully dynamic XML labeling scheme. In: SIGMOD (2009)

  25. Cautis, B., Deutsch, A., Onose, N.: XPath rewriting using multiple views: achieving completeness and efficiency. In: WebDB (2008)

  26. Karanasos, K.: View-based techniques for the efficient management of web data. PhD thesis, U. Paris Sud (2012)

  27. Hidders, J.: Satisfiability of XPath expressions. In: DBPL, pp. 21–36 (2003)

  28. Neumann, T., Weikum, G.: The RDF-3X engine for scalable management of RDF data. VLDB J. 19(1), 91–113 (2010)

    Article  Google Scholar 

  29. Schmidt, A., Waas, F., Kersten, M.L., Carey, M.J., Manolescu, I., Busse, R.: XMark: a benchmark for XML data management. In: VLDB, pp. 974–985 (2002)

  30. Franceschet, M.: XPathMark: an XPath benchmark for the XMark generated data. In: XSym (2005)

  31. Karanasos, K., Katsifodimos, A., Manolescu, I., Zoupanos, S.: ViP2P: efficient XML management in DHT networks. In: ICWE (2012)

  32. Manolescu, I., Karanasos, K., Vassalos, V., Zoupanos, S.: Efficient XQuery rewriting using multiple views. In: ICDE (2011)

  33. Graefe, G.: Encapsulation of parallelism in the Volcano query processing system. In: SIGMOD (1990)

  34. Goasdoué, F., Karanasos, K., Katsis, Y., Leblay, J., Manolescu, I., Zampetakis, S.: Growing triples on trees: an XML-RDF hybrid model for annotated documents. In: Brambilla, M., Casati, F., Ceri, S. (eds.) VLDS. Seattle, United States (2011)

    Google Scholar 

  35. Oracle Berkeley DB Java Edition. http://oracle.com/technetwork/database/berkeleydb/

  36. Online experiment site. http://tripleo.saclay.inria.fr/xr/experiments

  37. Kahan, J., Koivunen, M.-R., Prud’hommeaux, E., Swick, R.R.: Annotea: an open RDF infrastructure for shared web annotations. Comput. Netw. 39(5), 589–608 (2002)

    Article  Google Scholar 

  38. Haslhofer, B., Simon, R., Sanderson, R., Van de Sompel, H.: The open annotation collaboration (OAC) model. In: Multimedia on the Web (MMWeb), 2011 Workshop on, pp. 5–9 (Sept. 2011)

  39. Handschuh, S., Staab, S.: Authoring and annotation of Web pages in CREAM. In: WWW (2002)

  40. Yee, K.-P.: CritLink: advanced hyperlinks enable public annotation on the web. In: Computer supported cooperative work (CSCW) (2002)

  41. Dill, S., Eiron, N., Gibson, D., Gruhl, D., Guha, R., Jhingran, A., Kanungo, T., Rajagopalan, S., Tomkins, A., Tomlin, J.A., Zien, J.Y.: SemTag and seeker: bootstrapping the semantic web via automated semantic annotation. In: WWW (2003)

  42. Vargas-Vera, M., Motta, E., Domingue, J., Lanzoni, M., Stutt, A., Ciravegna, F.: MnM: ontology driven semi-automatic and automatic support for semantic markup. In: EKAW (2002)

  43. Reeve, L., Han, H.: Survey of semantic annotation platforms. In: ACM SAC (2005)

  44. Abiteboul, S., Allard, T., Chatalic, P., Gardarin, G., Ghitescu, A., Goasdoué, F., Manolescu, I., Nguyen, B., Ouazara, M., Somani, A., Travers, N., Vasile, G., Zoupanos, S.: WebContent: efficient P2P warehousing of web data (demonstration). PVLDB (2008)

  45. Microformats. http://microformats.org/

  46. RDF in HTML. http://research.talis.com/2005/erdf/wiki/Main/RdfInHtml (2006)

  47. RDFa Primer. http://www.w3.org/TR/xhtml-rdfa-primer/ (2004)

  48. Karanasos, K., Zoupanos, S.: Viewing a world of annotations through AnnoVIP (demonstration). In: ICDE (2010)

  49. GRDDL. http://www.w3.org/TR/grddl/ (2008)

  50. Akhtar, W., Kopecký, J., Krennwallner, T., Polleres, A.: XSPARQL: traveling between the XML and RDF worlds—and avoiding the XSLT pilgrimage. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M., (eds.), ESWC, volume 5021 of Lecture Notes in Computer Science, pp. 432–447. Springer (2008)

  51. Patel-Schneider, P., Siméon, J.: The Yin/Yang web: XML syntax and RDF semantics. In: WWW (2002)

  52. Robie, J., Garshol, L.M., Newcomb, S., Biezunski, M., Fuchs, M., Miller, L., Brickley, D., Christophides, V., Karvounarakis, G.: The syntactic web. Markup Lang (September 2001)

  53. Corby, O., Kefi Khelif, L., Cherfi, H., Gandon, F., Khelif, K.: Querying the semantic web of data using SPARQL, RDF and XML. Research Report RR-6847, INRIA (2009)

  54. Furche, T., Bry, F., Bolzer, O.: Marriages of convenience: triples and graphs. RDF and XML in web querying. In: Principles and Practice of Semantic Web Reasoning, Springer, Berlin (2005)

  55. Droop, M., Flarer, M., Groppe, J., Groppe, S., Linnemann, V., Pinggera, J., Santner, F., Schier, M., Schöpf, F., Staffler, H., Zugal, S.: Translating XPath queries into SPARQL queries. In: OTM (2007)

  56. Droop, M., Flarer, M., Groppe, J., Groppe, S., Linnemann, V., Pinggera, J., Santner, F., Schier, M., Schöpf, F., Staffler, H., Zugal, S.: Bringing the XML and semantic web worlds closer: transforming XML into RDF and embedding XPath into SPARQL. In: Enterprise Information Systems, Springer, Berlin (2009)

  57. Goasdoué, F., Karanasos, K., Katsis, Y., Leblay, J., Manolescu, I., Zampetakis, S.: Growing triples on trees: an XML-RDF hybrid model for annotated documents. In: BDA (Informal proceedings), Rabat, Morocco (2011)

  58. Goasdoué, F., Karanasos, K., Leblay, J., Manolescu, I.: View selection in semantic web databases. PVLDB, 5(2), (Oct. 2011)

  59. Katsifodimos, A., Manolescu, I., Vassalos, V.: Materialized view selection for XQuery workloads. In: SIGMOD (2012)

  60. Goasdoué, F., Karanasos, K., Katsis, Y., Leblay, J., Manolescu, I., Zampetakis, S.: Fact-checking and analysing the web (demonstration). In: SIGMOD, New York (2013)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julien Leblay.

Additional information

This work was done while K. Karanasos was at Inria Saclay and Y. Katsis was at Inria Saclay and ENS Cachan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Goasdoué, F., Karanasos, K., Katsis, Y. et al. Growing triples on trees: an XML-RDF hybrid model for annotated documents. The VLDB Journal 22, 589–613 (2013). https://doi.org/10.1007/s00778-013-0321-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-013-0321-2

Keywords

Navigation