ABSTRACT
XML has experimented a rapid growth mostly because of its application on the Web. Application varies from version control management, data storage to clustering and information retrieval. In this context, it is necessary to develop efficient techniques for comparing XML documents. Many method proposed are based only on structural commonalities, ignoring semantics. In this paper, we propose a new method for comparing XML documents based on LevelEdge combining tag structural and semantic similarities.
- P. Antonellis, C. Makris, and N. Tsirakis. Xedge: clustering homogeneous and heterogeneous xml documents using edge summaries. In Proceedings of the 2008 ACM symposium on Applied computing, SAC '08, pages 1081--1088, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- S. S. Chawathe. Comparing hierarchical data in external memory. In Proceedings of the 25th International Conference on Very Large Data Bases, VLDB '99, pages 90--101, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
- D. Lin. An information-theoretic definition of similarity. In Proceedings of the Fifteenth International Conference on Machine Learning, ICML '98, pages 296--304, San Francisco, CA, USA, 1998. Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
- R. Nayak and S. Xu. Xcls: A fast and effective clustering algorithm for heterogenous xml documents. Lecture Notes in Computer Science, pages 292--302, 2006. Google ScholarDigital Library
- R. Sibson. Slink: An optimally efficient algorithm for the single-link cluster method. The Computer Journal, 16(1): 30--34, 1973.Google ScholarCross Ref
- J. Tekli and R. Chbeir. A novel xml document structure comparison framework based-on sub-tree commonalities and label semantics. Web Semant., 11: 14--40, Mar. 2012. Google ScholarDigital Library
- J. Tekli, R. Chbeir, and K. Yetongnon. Structural similarity evaluation between xml documents and dtds. In Proceedings of the 8th international conference on Web information systems engineering, WISE'07, pages 196--211, Berlin, Heidelberg, 2007. Springer-Verlag. Google ScholarDigital Library
- Q. Wang, Z. Ren, L. Dong, and Z. Sheng. Path-based xml relational storage approach. Physics Procedia, 33(0): 1621--1625, 2012. 2012 International Conference on Medical Physics and Biomedical Engineering (ICMPBE2012).Google ScholarCross Ref
Index Terms
- Structural and semantic similarity for XML comparison
Recommendations
XML schema clustering with semantic and hierarchical similarity measures
With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present ...
Mapping of bibliographical standards into XML
The most popular bibliographical standards, which prescribe the exchange of bibliographical data in machine readable form, are MARC (Machine Readable Cataloguing) and UNIMARC (Universal Machine Readable Cataloguing). This paper presents two schemas, ...
Structural similarity evaluation between XML documents and DTDs
WISE'07: Proceedings of the 8th international conference on Web information systems engineeringThe automatic processing and management of XML-based data are ever more popular research issues due to the increasing abundant use of XML, especially on the Web. Nonetheless, several operations based on the structure of XML data have not yet received ...
Comments