Abstract
Systems for managing and querying semistructured-data sources often store data in proprietary object repositories or in a tagged-text format. We describe a technique that can use relational database management systems to store and manage semistructured data. Our technique relies on a mapping between the semistructured data model and the relational data model, expressed in a query language called STORED. When a semistructured data instance is given, a STORED mapping can be generated automatically using data-mining techniques. We are interested in applying STORED to XML data, which is an instance of semistructured data. We show how a document-type-descriptor (DTD), when present, can be exploited to further improve performance.
- 1 S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. Wiener. The Lorel query language for semistructured data. International Journal on Digital Libraries, 1(1):68- 88, April 1997.]]Google ScholarCross Ref
- 2 R. Agrawal, T. Imie}Linski, and A. Swami. Mining association rules between sets of" items in large databases. In Proceedings of A CM SIGMOD Conference on Management of Data, pages 207-216, Washington, DC, 1993.]] Google ScholarDigital Library
- 3 Catriel Beeri and ~Ibva Milo. Schemas for integration and translation of structured and semi-structured data. In Proceedings of the International Conference on Database Theory, 1999. to appear.]] Google ScholarDigital Library
- 4 Peter Buneman, Susan Davidson, Gerd Hillebrand, and Dan Suciu. A query language and optimization techniques for unstructured data. In Proceedings of A CM-SIGMOD International Conference on Management of Data, pages 505-516, 1996.]] Google ScholarDigital Library
- 5 V. Christophtdes, S. Abiteboul, S. CIuet, and M. Scholl. From structured documents to novel query facilities. In Richard Snodgrass and Marianne Winslett, editors, Proceedings of 1994 A CM SIGMOD International Conference on Management of Data, Minneapolis, Minnesota, May 1994.]] Google ScholarDigital Library
- 6 Mary Fernandez, Daniela Florescu, Jaewoo Kang, Alon Levy, and Dan Suciu. Catching the boat with Strudel: experience with a web-site management system. In Proceedings of A CM-SIGMOD International Conference on Management of Data, 1998.]] Google ScholarDigital Library
- 7 Mary Fernandez, Daniela Florescu, Alon Levy, and Dan Suciu. A query language for a web-site management system. SIGMOD Record, 26(3):4-11, September 1997.]] Google ScholarDigital Library
- 8 M. Garey and D. Johnson. Computers and Intractability: A Guide to the Theory of Af79-completeness. W. H. Freeman, San Francisco, 1979.]] Google ScholarDigital Library
- 9 S. Ginsburg. The Mathematical Theory of Context-~.ee Languages. McGraw-Hill, 1966.]] Google ScholarDigital Library
- 10 K.BShm, K.Aberer, E.Neuhold, and X.Yang. Structured document storage and refined declarative and navigational access mechanisms in HyperStorM. VLDB Journal, 6(4):296-311, November 1997.]] Google ScholarDigital Library
- 11 Alon Levy, Alberto Mendelzon, Yehoshua Sagiv, and Dive sh Srivastava. Answering queries using views. In Proceedings of the 14th Symposium on Principles of Database Systems, San Jose, CA, June 1995.]] Google ScholarDigital Library
- 12 M.Volz, K.Aberer, and K.BShm. Applying a flexible OODBMS-IRS-Coupling to structured document handling. In Internaltional Conference on Data Engineering, February 1996.]] Google ScholarDigital Library
- 13 S. Nestorov, S. Abiteboul, and R. Motwani. Extracting schema from semistructured data. In Proceedings of ~he A CM Conference on Management of Data, pages 295-31)6, 1998.]] Google ScholarDigital Library
- 14 Michael R. Genesereth Oliver M. Duschka. Answering recursive queries using views. In Proceedings of the ACM Symposium on Principles of Database Systems, pages 109- 116, 1997.]] Google ScholarDigital Library
- 15 Y. Papakonstantinou, S. Abiteboul, and H. Garcia-Molina. Object fusion in mediator systems. In Proceedings of Very Large Data Bases, pages 413-424, September 1996.]] Google ScholarDigital Library
- 16 Y. Papakonstantinou, H. Garcia-Molina, and J. Widom. Object exchange across heterogeneous information sources. In IEEE International Conference on Data Engineering, pages 251-260, March 1995.]] Google ScholarDigital Library
- 17 D. Quass, A. Rajaraman, Y. Sagiv, J. Ullman, and J. Widorn. Querying semistructure heterogeneous information. In International Conference on Deductive and Object Oriented Databases, pages 319-344, 1995.]] Google ScholarDigital Library
- 18 Dimitri Theodoratos and Timos Sellis. Data warehouse configuration. In Proceedings of the International Conference on Very Large Data Bases, pages 126-135, Athens, Greece, August 1997.]] Google ScholarDigital Library
- 19 O. Tsatalos, M. Solomon, and Y. Ioannidis. The GMAP: a vesatile tool for physical data independence. In Proc. 20th International VLDB Conference, 1994.]] Google ScholarDigital Library
- 20 Jeffrey D. Ullman. Principles of Database and Knowledgebase Systems H: The New Technologies. Computer Science Press, Rockvill, MD 20850, 1989.]] Google ScholarDigital Library
- 21 Ke Wang and Huiqing Liu. Discovering typical structures of documents: a road map approach. In A CM SIGIR Conference on Research and Development in Information Retrieval, August 1998.]] Google ScholarDigital Library
- 22 Tian Zhang, Raghu Ramakrishnan, and Miron Livny. BIRCH: an efficient data clustering method for very large databases. In Proceedings of A CM Conference on Management of Data, pages 103-114, 1996.]] Google ScholarDigital Library
Index Terms
- Storing semistructured data with STORED
Recommendations
Storing semistructured data with STORED
SIGMOD '99: Proceedings of the 1999 ACM SIGMOD international conference on Management of dataSystems for managing and querying semistructured-data sources often store data in proprietary object repositories or in a tagged-text format. We describe a technique that can use relational database management systems to store and manage semistructured ...
Semistructured data and XML
Information organization and databasesXML poses a new set of challenges for semistructured data research. The Extensible Markup Language, XML, is a new recommendation from World Wide Web Consortium that will become a universal data exchange format for the Web. XML shares many common ...
Merging multimedia presentations and semistructured temporal data: a graph-based model and its application to clinical information
Objective:: In this paper, we focus on the issue of providing physicians with the capability of representing in a seamless way both temporal aspects of multimedia semistructured data and their temporal presentation requirements. Background:: ...
Comments