Skip to main content

KF-Diff+: Highly Efficient Change Detection Algorithm for XML Documents

  • Conference paper
  • First Online:
On the Move to Meaningful Internet Systems 2002: CoopIS, DOA, and ODBASE (OTM 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2519))

Abstract

Most previous work in change detection on XML documents used the ordered tree, with the best complexity of O(nlogn), where n is the size of the document. The best algorithm we had ever known for unordered model achieves polynomial time in complexity. In this paper, we propose a highly efficient algorithm named KF-Diff+. The key property of our algorithm is that the algorithm transforms the traditional tree-to-tree correction into the comparing of the key trees which are substantially label trees without duplicate paths with the complexity of O(n), where n is the number of nodes in the trees. In addition, KF-Diff+ is tailored to both ordered trees and unordered trees. Experiment shows that KF-Diff+ can handle XML documents at extreme speed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Berk, E.: HtmlDiff: A Differencing Tool for HTML Documents. Student Project, Princeton University

    Google Scholar 

  2. Chawathe, S., Rajaraman, A. Garcia-Molina, H.: Change Detection in Hierarchically Structured Information. Proceedings of the ACM SIGMOD International Conference on Management of Data, Montreal, June 1996.

    Google Scholar 

  3. Curbera, F. P.: Fast Difference and Update of XML Documents. XTech’99, San Jose, March 1999.

    Google Scholar 

  4. Microsystems, S.: Making all the difference. http://www.sun.com/xml/developers/diffmk/.

  5. Chawathe, S., Garcia-Molina, H.: Meaningful change detection in structured data. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Tuscon, Arizona, May 1997.

    Google Scholar 

  6. Douglis, F., Ball, T., Chen, Y. F., Koutsofios, E.: The AT&T Internet Difference Engine: Tracking and Viewing Changes on the Web. World Wide Web, 1(1): 27–44, January 1998.

    Google Scholar 

  7. Maruyama, H., Tamura, K., Uramoto, R.: Digest values for DOM (DOMHash) proposal. IBM Tokyo Research Laboratory, http://www.trl.ibm.co.jp/projects/xml/domhash.htm, 1998.

  8. Wang, Y., DeWitt, D. J., Cai, J.: X-Diff: A Fast Change Detection Algorithm for XML Documents. http://www.cs.wisc.edu/~yuanwang/xdiff.html.

  9. Zhang, K., Shasha, D.: Simple Fast Algorithms for the Editing Distance between Trees and Related Problems. SIAM Journal of Computing, 18(6): 1245–1262, 1989.

    Article  MATH  MathSciNet  Google Scholar 

  10. Fan, W., Schwenzer, P., Wu, K.: Keys with Upward Wildcards for XML. Database and Expert Systems Applications, 657–667, 2001.

    Google Scholar 

  11. Cobéna, G., Abiteboul, S., Marian, A.: Detecting Changes in XML Documents. ICDE, Feb, 2002.

    Google Scholar 

  12. Xu, H., Wu, Q., Wang, H., Yang, G., Jia, Y.: XFDS: Efficient Monitoring and Filtering of XML Information on the Web. submitted to publication, 2002.

    Google Scholar 

  13. World Wide Consortium. Extensible markup language (xml) 1.0. http://www.w3.org/TR/REC-xml, 2000.

  14. Zhang, K.: A New Editing based Distance between Unordered Labeled Trees. Combinatorial Pattern Matching, 1: 254–265, 1993.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Xu, H., Wu, Q., Wang, H., Yang, G., Jia, Y. (2002). KF-Diff+: Highly Efficient Change Detection Algorithm for XML Documents. In: Meersman, R., Tari, Z. (eds) On the Move to Meaningful Internet Systems 2002: CoopIS, DOA, and ODBASE. OTM 2002. Lecture Notes in Computer Science, vol 2519. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36124-3_80

Download citation

  • DOI: https://doi.org/10.1007/3-540-36124-3_80

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-00106-5

  • Online ISBN: 978-3-540-36124-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics