Learning Ontologies to Improve Text Clustering and Classification

Bloehdorn, Stephan; Cimiano, Philipp; Hotho, Andreas

doi:10.1007/3-540-31314-1_40

Stephan Bloehdorn²²,
Philipp Cimiano²² &
Andreas Hotho²³

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2281 Accesses
17 Citations

Abstract

Recent work has shown improvements in text clustering and classification tasks by integrating conceptual features extracted from ontologies. In this paper we present text mining experiments in the medical domain in which the ontological structures used are acquired automatically in an unsupervised learning process from the text corpus in question. We compare results obtained using the automatically learned ontologies with those obtained using manually engineered ones. Our results show that both types of ontologies improve results on text clustering and classification tasks, whereby the automatically acquired ontologies yield a improvement competitive with the manually engineered ones.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 159.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

BLOEHDORN, S. and HOTHO, A. (2004): Text Classification by Boosting Weak Learners based on Terms and Concepts. In: Proceedings of ICDM, 2004. IEEE Computer Society.
Google Scholar
CAI, L. and HOFMANN, T. (2003): Text Categorization by Boosting Automatically Extracted Concepts. In: Proceedings of ACM SIGIR, 2003. ACM Press.
Google Scholar
CIMIANO, P.; HOTHO, A. and STAAB, S. (2004): Comparing Conceptual, Partitional and Agglomerative Clustering for Learning Taxonomies from Text. In: Proceedings of ECAI’04. IOS Press.
Google Scholar
CIMIANO, P. and HOTHO, A. and STAAB, S. (2005): Learning Concept Hieararchies from Text Corpora using Formal Concept Analysis. Journal of Artificial Intelligence Research. To appear.
Google Scholar
DEERWESTER, S.; DUMAIS, S.T.; LANDAUER, T.K.; FURNAS, G. W. and HARSHMAN, R.A. (1990): Indexing by Latent Semantic Analysis. Journal of the Society for Information Science, 41, 391–407.
Google Scholar
FREUND, Y. and SCHAPIRE, R.E. (1995): A Decision Theoretic Generalization of On-Line Learning and an Application to Boosting. In: Second European Conference on Computational Learning Theory (EuroCOLT-95).
Google Scholar
GREEN, S.J. (1999): Building Hypertext Links By Computing Semantic Similarity. IEEE Transactions on Knowledge and Data Engineering, 11, 713–730.
Article Google Scholar
HARRIS, Z. (1968): Mathematical Structures of Language. Wiley, New York, US.
Google Scholar
HEARST, M.A. (1992): Automatic Acquisition of Hyponyms from Large Text Corpora. In: Proceedings of the 14th International Conference on Computational Linguistics (COLING).
Google Scholar
HERSH, W. R.; BUCKLEY, C.; LEONE, T.J. and HICKAM, D.H. (1994): OHSUMED: An Interactive Retrieval Evaluation and new large Test Collection for Research. In: Proceedings of ACM SIGIR, 1994. ACM Press.
Google Scholar
HINDLE, D. (1990): Noun Classification from Predicate-Argument Structures. In: Proceedings of the Annual Meeting of the ACL.
Google Scholar
HOTHO, A.; STAAB, S. and STUMME, G. (2003): Ontologies Improve Text Document Clustering. In: Proceedings of ICDM, 2003. IEEE Computer Society.
Google Scholar
JAIN, A. K., MURTY, M. N., and FLYNN, P. J. (1999): Data Clustering: A review. ACM Computing Surveys, 31, 264–323.
Article Google Scholar
MAEDCHE, A. and STAAB, S. (2001): Ontology Learning for the Semantic Web. IEEE Intelligent Systems, 16, 72–79.
Article Google Scholar
REINBERGER, M.-L. and SPYNS, P. (2005): Unsupervised Text Mining for the Learning of DOGMA-inspired Ontologies. In: Ontology Learning from Text: Methods, Evaluation and Applications. IOS Press. To appear.
Google Scholar
SALTON, G. and MCGILL, M.J. (1983): Introduction to Modern Information Retrieval. McGraw-Hill, New York, NY, US.
Google Scholar
SCOTT, S. and MATWIN, S. (1999): Feature Engineering for Text Classification. In: Proceedings of ICML, 1999. Morgan Kaufmann. 379–388.
Google Scholar
SEBASTIANI, F. (2002): Machine Learning in Automated Text Categorization. ACM Computing Surveys, 34, 1–47
Article MathSciNet Google Scholar
STEINBACH, M., KARYPIS, G., and KUMAR, V. (2000): A Comparison of Document Clustering Techniques. In: KDD Workshop on Text Mining 2000.
Google Scholar
WANG, B.; MCKAY, R.I.; ABBASS, H.A. and BARLOW, M. (2003): A Comparative Study for Domain Ontology Guided Feature Extraction. In: Proceedings of ACSC-2003. Australian Computer Society.
Google Scholar

Download references

Author information

Authors and Affiliations

Institute AIFB, University of Karlsruhe, D-76128, Karlsruhe, Germany
Stephan Bloehdorn & Philipp Cimiano
KDE Group, University of Kassel, D-34321, Kassel, Germany
Andreas Hotho

Authors

Stephan Bloehdorn
View author publications
You can also search for this author in PubMed Google Scholar
Philipp Cimiano
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Hotho
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Technische und Betriebliche Informationssysteme, Otto-von-Guericke-Universität Magdeburg, Universitätsplatz 2, 39106, Magdeburg, Germany
Myra Spiliopoulou
Institut für Wissens- und Sprachverarbeitung, Otto-von-Guericke-Universität Magdeburg, Universitätsplatz 2, 39106, Magdeburg, Germany
Rudolf Kruse , Christian Borgelt & Andreas Nürnberger , &
Institut für Entscheidungstheorie und Unternehmensforschung, Universität Karlsruhe (TH), 76128, Karlsruhe
Wolfgang Gaul

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bloehdorn, S., Cimiano, P., Hotho, A. (2006). Learning Ontologies to Improve Text Clustering and Classification. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds) From Data and Information Analysis to Knowledge Engineering. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-31314-1_40

Download citation

DOI: https://doi.org/10.1007/3-540-31314-1_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31313-7
Online ISBN: 978-3-540-31314-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics