Skip to main content

Learning Ontologies to Improve Text Clustering and Classification

  • Conference paper
From Data and Information Analysis to Knowledge Engineering

Abstract

Recent work has shown improvements in text clustering and classification tasks by integrating conceptual features extracted from ontologies. In this paper we present text mining experiments in the medical domain in which the ontological structures used are acquired automatically in an unsupervised learning process from the text corpus in question. We compare results obtained using the automatically learned ontologies with those obtained using manually engineered ones. Our results show that both types of ontologies improve results on text clustering and classification tasks, whereby the automatically acquired ontologies yield a improvement competitive with the manually engineered ones.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 159.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • BLOEHDORN, S. and HOTHO, A. (2004): Text Classification by Boosting Weak Learners based on Terms and Concepts. In: Proceedings of ICDM, 2004. IEEE Computer Society.

    Google Scholar 

  • CAI, L. and HOFMANN, T. (2003): Text Categorization by Boosting Automatically Extracted Concepts. In: Proceedings of ACM SIGIR, 2003. ACM Press.

    Google Scholar 

  • CIMIANO, P.; HOTHO, A. and STAAB, S. (2004): Comparing Conceptual, Partitional and Agglomerative Clustering for Learning Taxonomies from Text. In: Proceedings of ECAI’04. IOS Press.

    Google Scholar 

  • CIMIANO, P. and HOTHO, A. and STAAB, S. (2005): Learning Concept Hieararchies from Text Corpora using Formal Concept Analysis. Journal of Artificial Intelligence Research. To appear.

    Google Scholar 

  • DEERWESTER, S.; DUMAIS, S.T.; LANDAUER, T.K.; FURNAS, G. W. and HARSHMAN, R.A. (1990): Indexing by Latent Semantic Analysis. Journal of the Society for Information Science, 41, 391–407.

    Google Scholar 

  • FREUND, Y. and SCHAPIRE, R.E. (1995): A Decision Theoretic Generalization of On-Line Learning and an Application to Boosting. In: Second European Conference on Computational Learning Theory (EuroCOLT-95).

    Google Scholar 

  • GREEN, S.J. (1999): Building Hypertext Links By Computing Semantic Similarity. IEEE Transactions on Knowledge and Data Engineering, 11, 713–730.

    Article  Google Scholar 

  • HARRIS, Z. (1968): Mathematical Structures of Language. Wiley, New York, US.

    Google Scholar 

  • HEARST, M.A. (1992): Automatic Acquisition of Hyponyms from Large Text Corpora. In: Proceedings of the 14th International Conference on Computational Linguistics (COLING).

    Google Scholar 

  • HERSH, W. R.; BUCKLEY, C.; LEONE, T.J. and HICKAM, D.H. (1994): OHSUMED: An Interactive Retrieval Evaluation and new large Test Collection for Research. In: Proceedings of ACM SIGIR, 1994. ACM Press.

    Google Scholar 

  • HINDLE, D. (1990): Noun Classification from Predicate-Argument Structures. In: Proceedings of the Annual Meeting of the ACL.

    Google Scholar 

  • HOTHO, A.; STAAB, S. and STUMME, G. (2003): Ontologies Improve Text Document Clustering. In: Proceedings of ICDM, 2003. IEEE Computer Society.

    Google Scholar 

  • JAIN, A. K., MURTY, M. N., and FLYNN, P. J. (1999): Data Clustering: A review. ACM Computing Surveys, 31, 264–323.

    Article  Google Scholar 

  • MAEDCHE, A. and STAAB, S. (2001): Ontology Learning for the Semantic Web. IEEE Intelligent Systems, 16, 72–79.

    Article  Google Scholar 

  • REINBERGER, M.-L. and SPYNS, P. (2005): Unsupervised Text Mining for the Learning of DOGMA-inspired Ontologies. In: Ontology Learning from Text: Methods, Evaluation and Applications. IOS Press. To appear.

    Google Scholar 

  • SALTON, G. and MCGILL, M.J. (1983): Introduction to Modern Information Retrieval. McGraw-Hill, New York, NY, US.

    Google Scholar 

  • SCOTT, S. and MATWIN, S. (1999): Feature Engineering for Text Classification. In: Proceedings of ICML, 1999. Morgan Kaufmann. 379–388.

    Google Scholar 

  • SEBASTIANI, F. (2002): Machine Learning in Automated Text Categorization. ACM Computing Surveys, 34, 1–47

    Article  MathSciNet  Google Scholar 

  • STEINBACH, M., KARYPIS, G., and KUMAR, V. (2000): A Comparison of Document Clustering Techniques. In: KDD Workshop on Text Mining 2000.

    Google Scholar 

  • WANG, B.; MCKAY, R.I.; ABBASS, H.A. and BARLOW, M. (2003): A Comparative Study for Domain Ontology Guided Feature Extraction. In: Proceedings of ACSC-2003. Australian Computer Society.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer Berlin · Heidelberg

About this paper

Cite this paper

Bloehdorn, S., Cimiano, P., Hotho, A. (2006). Learning Ontologies to Improve Text Clustering and Classification. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds) From Data and Information Analysis to Knowledge Engineering. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-31314-1_40

Download citation

Publish with us

Policies and ethics