skip to main content
10.1145/1096601.1096652acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
Article

A case study on alternate representations of data structures in XML

Published:02 November 2005Publication History

ABSTRACT

XML provides a universal and portable format for document and data exchange. While the syntax and specification of XML makes documents both human readable and machine parsable, it is often at the expense of efficiency when representing simple data structures.We investigate the ``costs'' associated with XML serialization from several resource perspectives: storage, transport, processing and human readability. These experiments are done within the context of a large text-centric service oriented architecture -- IBM's WebFountain project.We find that for several applications, human readable formats outperform binary equivalents, especially in the area of data size, and that the costs of processing encoded binary data often exceeds that of processing terse human readable formats.

References

  1. R. Agrawal, R. Bayardo, D. Gruhl, and S. Papadimitriou. Vinci: A service-oriented architecture for rapid development of web applications. In Proceedings of the Tenth International World Wide Web Conference (WWW2001), pages 355--365, Hong Kong, China, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Box, D. Ehnebuske, G. Kakivaya, A. Layman, N. Mendelsohn, H. F. Nielsen, S. Thatte, and D. Winder. Simple Object Access Protocol. http://www.w3.org/TR/SOAP/, May 2000.Google ScholarGoogle Scholar
  3. J. Cheney. Compressing xml with multiplexed hierarchical ppm models. In DCC '01: Proceedings of the Data Compression Conference (DCC'01), page 163, Washington, DC, USA, 2001. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Deutsch. Gzip file format specification version 4.3. RFC 1952, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Gruhl, L. Chavet, D. Gibson, J. Meyer, P. Pattanayak, A. Tomkins, and J. Zien. How to build a webfountain: An architecture for very large-scale text analytics. IBM Systems Journal, 43(1):64--77, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Josefsson. The base16, base32, and base64 data encodings. RFC 3548, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Knuth. The Art Of Computer Programming: Sorting and Searching. Addison Wesley, 1973. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. H. Liefke and D. Suciu. Xmill: an efficient compressor for xml data. In SIGMOD '00: Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pages 153--164, New York, NY, USA, 2000. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. OMG. Xml metadata interchange (xmi). http://www.omg.org/technology/documents/formal/xmi.htm, 2002.Google ScholarGoogle Scholar
  10. N. Sundaresan and R. Moussa. Algorithms and programming models for efficient representation of xml for internet applications. In WWW '01: Proceedings of the 10th international conference on World Wide Web, pages 366--375, New York, NY, USA, 2001. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. W3C. Wap binary xml content format. http://www.w3.org/TR/wbxml/, 1999.Google ScholarGoogle Scholar
  12. W3C. Xml binary characterization. http://www.w3.org/TR/xbc-characterization/, 2005.Google ScholarGoogle Scholar

Index Terms

  1. A case study on alternate representations of data structures in XML

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            DocEng '05: Proceedings of the 2005 ACM symposium on Document engineering
            November 2005
            252 pages
            ISBN:1595932402
            DOI:10.1145/1096601

            Copyright © 2005 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 2 November 2005

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            Overall Acceptance Rate178of537submissions,33%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader