ABSTRACT
XML provides a universal and portable format for document and data exchange. While the syntax and specification of XML makes documents both human readable and machine parsable, it is often at the expense of efficiency when representing simple data structures.We investigate the ``costs'' associated with XML serialization from several resource perspectives: storage, transport, processing and human readability. These experiments are done within the context of a large text-centric service oriented architecture -- IBM's WebFountain project.We find that for several applications, human readable formats outperform binary equivalents, especially in the area of data size, and that the costs of processing encoded binary data often exceeds that of processing terse human readable formats.
- R. Agrawal, R. Bayardo, D. Gruhl, and S. Papadimitriou. Vinci: A service-oriented architecture for rapid development of web applications. In Proceedings of the Tenth International World Wide Web Conference (WWW2001), pages 355--365, Hong Kong, China, 2001. Google ScholarDigital Library
- D. Box, D. Ehnebuske, G. Kakivaya, A. Layman, N. Mendelsohn, H. F. Nielsen, S. Thatte, and D. Winder. Simple Object Access Protocol. http://www.w3.org/TR/SOAP/, May 2000.Google Scholar
- J. Cheney. Compressing xml with multiplexed hierarchical ppm models. In DCC '01: Proceedings of the Data Compression Conference (DCC'01), page 163, Washington, DC, USA, 2001. IEEE Computer Society. Google ScholarDigital Library
- P. Deutsch. Gzip file format specification version 4.3. RFC 1952, 1996. Google ScholarDigital Library
- D. Gruhl, L. Chavet, D. Gibson, J. Meyer, P. Pattanayak, A. Tomkins, and J. Zien. How to build a webfountain: An architecture for very large-scale text analytics. IBM Systems Journal, 43(1):64--77, 2004. Google ScholarDigital Library
- S. Josefsson. The base16, base32, and base64 data encodings. RFC 3548, 2003. Google ScholarDigital Library
- D. Knuth. The Art Of Computer Programming: Sorting and Searching. Addison Wesley, 1973. Google ScholarDigital Library
- H. Liefke and D. Suciu. Xmill: an efficient compressor for xml data. In SIGMOD '00: Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pages 153--164, New York, NY, USA, 2000. ACM Press. Google ScholarDigital Library
- OMG. Xml metadata interchange (xmi). http://www.omg.org/technology/documents/formal/xmi.htm, 2002.Google Scholar
- N. Sundaresan and R. Moussa. Algorithms and programming models for efficient representation of xml for internet applications. In WWW '01: Proceedings of the 10th international conference on World Wide Web, pages 366--375, New York, NY, USA, 2001. ACM Press. Google ScholarDigital Library
- W3C. Wap binary xml content format. http://www.w3.org/TR/wbxml/, 1999.Google Scholar
- W3C. Xml binary characterization. http://www.w3.org/TR/xbc-characterization/, 2005.Google Scholar
Index Terms
- A case study on alternate representations of data structures in XML
Recommendations
A space efficient XML DOM parser
In many XML applications, parsing is a key operation. When the processing involves modifying data, random access, and/or in an order different from the one in which elements are stored, a DOM parser has to be used. A major problem with using a DOM ...
Mapping of bibliographical standards into XML
The most popular bibliographical standards, which prescribe the exchange of bibliographical data in machine readable form, are MARC (Machine Readable Cataloguing) and UNIMARC (Universal Machine Readable Cataloguing). This paper presents two schemas, ...
XML-based XML schema access
WWW '07: Proceedings of the 16th international conference on World Wide WebXML Schema's abstract data model consists of components, which are the structures that eventually define a schema as a whole. XML Schema's XML syntax, on the other hand, is not a direct representation of the schema components, and it proves to be ...
Comments