Article

XRules: an effective structural classifier for XML data

Authors:
Mohammed J. Zaki

Rensselaer Polytechnic Institute

Rensselaer Polytechnic Institute
View Profile

,
Charu C. Aggarwal

IBM T. J. Watson Research Center

IBM T. J. Watson Research Center
View Profile

KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2003Pages 316–325https://doi.org/10.1145/956750.956787

Published:24 August 2003Publication History

KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 316–325

ABSTRACT

XML documents have recently become ubiquitous because of their varied applicability in a number of applications. Classification is an important problem in the data mining domain, but current classification methods for XML documents use IR-based methods in which each document is treated as a bag of words. Such techniques ignore a significant amount of information hidden inside the documents. In this paper we discuss the problem of rule based classification of XML data by using frequent discriminatory substructures within XML documents. Such a technique is more capable of finding the classification characteristics of documents. In addition, the technique can also be extended to cost sensitive classification. We show the effectiveness of the method with respect to other classifiers. We note that the methodology discussed in this paper is applicable to any kind of semi-structured data.

References

C. C. Aggarwal. On Effective Classification of Strings with Wavelets. SIGKDD, 2002.]] Google ScholarDigital Library
C. Aggarwal, S. Gates, P. Yu. On the merits of using supervised clustering to build categorization systems. SIGKDD, 1999.]]Google Scholar
R. Agrawal, R. Srikant. Fast Algorithms for Mining Association Rules. VLDB Conference, 1994.]] Google ScholarDigital Library
K. Alsabti, S. Ranka, V. Singh. CLOUDS: A Decision Tree Classifier for Large Datasets. SIGKDD, 1998.]]Google Scholar
R. Andersen et al. Professional XML. Wrox Press Ltd, 2002.]]Google Scholar
T. Asai, et al. Efficient substructure discovery from large semi-structured data. 2nd SIAM Int'l Conference on Data Mining, 2002.]]Google ScholarDigital Library
W. W. Cohen. Fast Effective Rule Induction. Int'l Conf. Machine Learning, 1995.]]Google Scholar
P. Domingos. MetaCost: A general method for making classifiers cost sensitive. SIGKDD, 1999.]] Google ScholarDigital Library
G. Dong, X. Zhang, L. Wong, J. Li. CAEP: Classification by Aggregating Emerging Patterns. Int'l Conference on Discovery Science, 1999.]] Google ScholarDigital Library
R. Duda, P. Hart. Pattern Classification and Scene Analysis, Wiley, New York, 1973.]]Google ScholarDigital Library
J. Gehrke, v. Ganti, R. Ramakrishnan, W.-Y. Loh. BOAT: Optimistic Decision Tree Construction. SIGMOD, 1999.]] Google ScholarDigital Library
M. James. Classification Algorithms, Wiley, 1985.]] Google ScholarDigital Library
W. Li, J. Han, J. Pei. CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules. IEEE Int'l Conf. on Data Mining, 2001.]] Google ScholarDigital Library
J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.]] Google ScholarDigital Library
R. Rastogi, K. Shim. PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning. VLDB, 1998.]] Google ScholarDigital Library
B. Liu, W. Hsu, Y. Ma. Integrating Classification and Association Rule Mining. SIGKDD, 1998.]]Google Scholar
K. Nigam, A. K. McCallum, S. Thrum, T. Mitchell. Text Classification from labeled and unlabeled documents using EM. Machine Learning, 39(2/3):103--134, 2000.]] Google ScholarDigital Library
J. Punin, M. Krishnamoorthy, M. Zaki. LOGML: Log markup language for web usage mining. In WEBKDD Workshop (with SIGKDD), August 2001.]] Google ScholarDigital Library
A. Termier, M-C. Rousset, M. Sebag. TreeFinder: a First Step towards XML Data Mining. IEEE Int'l Conf. on Data Mining, 2002.]] Google ScholarDigital Library
K. Wang, H. Q. Liu. Discovering Typical Structures of Documents: A Road Map Approach. SIGIR, 1998.]] Google ScholarDigital Library
M. J. Zaki. Efficiently Mining Frequent Trees in a Forest. SIGKDD, 2002.]] Google ScholarDigital Library

Index Terms

XRules: an effective structural classifier for XML data
1. Information systems
  1. Information systems applications

Recommendations

XRules: An effective algorithm for structural classification of XML data

XML documents have recently become ubiquitous because of their varied applicability in a number of applications. Classification is an important problem in the data mining domain, but current classification methods for XML documents use IR-based methods ...
Read More
PORSCHE: Performance ORiented SCHEma mediation

Semantic matching of schemas in heterogeneous data sharing systems is time consuming and error prone. Existing mapping tools employ semi-automatic techniques for mapping two schemas at a time. In a large-scale scenario, where data sharing involves a ...
Read More
FFTM: optimized frequent tree mining with soft embedding constraints on siblings
CSTST '08: Proceedings of the 5th international conference on Soft computing as transdisciplinary science and technology

Databases have become increasingly large and the data they contain is increasingly bulky. Thus the problem of knowledge extraction has become very significant and requires multiple techniques for processing the data available in order to extract the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
August 2003
736 pages
ISBN:1581137370
DOI:10.1145/956750
Conference Chair:
Lise Getoor
University of Maryland, College Park
,
General Chair:
Ted Senator
DARPA
,
Program Chairs:
Pedro Domingos
University of Washington
,
Christos Faloutsos
Carnegie Mellon University
Copyright © 2003 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 August 2003
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
XML/Semi-structured data
classification
tree mining
Qualifiers
- Article
Conference

Acceptance Rates
KDD '03 Paper Acceptance Rate46of298submissions,15%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 116
  Total Citations
  View Citations
- 1,124
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

XRules: an effective structural classifier for XML data

KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

XRules: An effective algorithm for structural classification of XML data

PORSCHE: Performance ORiented SCHEma mediation

FFTM: optimized frequent tree mining with soft embedding constraints on siblings