skip to main content
10.1145/956750.956787acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

XRules: an effective structural classifier for XML data

Published:24 August 2003Publication History

ABSTRACT

XML documents have recently become ubiquitous because of their varied applicability in a number of applications. Classification is an important problem in the data mining domain, but current classification methods for XML documents use IR-based methods in which each document is treated as a bag of words. Such techniques ignore a significant amount of information hidden inside the documents. In this paper we discuss the problem of rule based classification of XML data by using frequent discriminatory substructures within XML documents. Such a technique is more capable of finding the classification characteristics of documents. In addition, the technique can also be extended to cost sensitive classification. We show the effectiveness of the method with respect to other classifiers. We note that the methodology discussed in this paper is applicable to any kind of semi-structured data.

References

  1. C. C. Aggarwal. On Effective Classification of Strings with Wavelets. SIGKDD, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Aggarwal, S. Gates, P. Yu. On the merits of using supervised clustering to build categorization systems. SIGKDD, 1999.]]Google ScholarGoogle Scholar
  3. R. Agrawal, R. Srikant. Fast Algorithms for Mining Association Rules. VLDB Conference, 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. Alsabti, S. Ranka, V. Singh. CLOUDS: A Decision Tree Classifier for Large Datasets. SIGKDD, 1998.]]Google ScholarGoogle Scholar
  5. R. Andersen et al. Professional XML. Wrox Press Ltd, 2002.]]Google ScholarGoogle Scholar
  6. T. Asai, et al. Efficient substructure discovery from large semi-structured data. 2nd SIAM Int'l Conference on Data Mining, 2002.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. W. W. Cohen. Fast Effective Rule Induction. Int'l Conf. Machine Learning, 1995.]]Google ScholarGoogle Scholar
  8. P. Domingos. MetaCost: A general method for making classifiers cost sensitive. SIGKDD, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Dong, X. Zhang, L. Wong, J. Li. CAEP: Classification by Aggregating Emerging Patterns. Int'l Conference on Discovery Science, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Duda, P. Hart. Pattern Classification and Scene Analysis, Wiley, New York, 1973.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Gehrke, v. Ganti, R. Ramakrishnan, W.-Y. Loh. BOAT: Optimistic Decision Tree Construction. SIGMOD, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. James. Classification Algorithms, Wiley, 1985.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. W. Li, J. Han, J. Pei. CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules. IEEE Int'l Conf. on Data Mining, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Rastogi, K. Shim. PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning. VLDB, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. Liu, W. Hsu, Y. Ma. Integrating Classification and Association Rule Mining. SIGKDD, 1998.]]Google ScholarGoogle Scholar
  17. K. Nigam, A. K. McCallum, S. Thrum, T. Mitchell. Text Classification from labeled and unlabeled documents using EM. Machine Learning, 39(2/3):103--134, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Punin, M. Krishnamoorthy, M. Zaki. LOGML: Log markup language for web usage mining. In WEBKDD Workshop (with SIGKDD), August 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Termier, M-C. Rousset, M. Sebag. TreeFinder: a First Step towards XML Data Mining. IEEE Int'l Conf. on Data Mining, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. K. Wang, H. Q. Liu. Discovering Typical Structures of Documents: A Road Map Approach. SIGIR, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. J. Zaki. Efficiently Mining Frequent Trees in a Forest. SIGKDD, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. XRules: an effective structural classifier for XML data

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2003
      736 pages
      ISBN:1581137370
      DOI:10.1145/956750

      Copyright © 2003 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 August 2003

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      KDD '03 Paper Acceptance Rate46of298submissions,15%Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader