Abstract
Recent work has shown improvements in text clustering and classification tasks by integrating conceptual features extracted from ontologies. In this paper we present text mining experiments in the medical domain in which the ontological structures used are acquired automatically in an unsupervised learning process from the text corpus in question. We compare results obtained using the automatically learned ontologies with those obtained using manually engineered ones. Our results show that both types of ontologies improve results on text clustering and classification tasks, whereby the automatically acquired ontologies yield a improvement competitive with the manually engineered ones.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
BLOEHDORN, S. and HOTHO, A. (2004): Text Classification by Boosting Weak Learners based on Terms and Concepts. In: Proceedings of ICDM, 2004. IEEE Computer Society.
CAI, L. and HOFMANN, T. (2003): Text Categorization by Boosting Automatically Extracted Concepts. In: Proceedings of ACM SIGIR, 2003. ACM Press.
CIMIANO, P.; HOTHO, A. and STAAB, S. (2004): Comparing Conceptual, Partitional and Agglomerative Clustering for Learning Taxonomies from Text. In: Proceedings of ECAI’04. IOS Press.
CIMIANO, P. and HOTHO, A. and STAAB, S. (2005): Learning Concept Hieararchies from Text Corpora using Formal Concept Analysis. Journal of Artificial Intelligence Research. To appear.
DEERWESTER, S.; DUMAIS, S.T.; LANDAUER, T.K.; FURNAS, G. W. and HARSHMAN, R.A. (1990): Indexing by Latent Semantic Analysis. Journal of the Society for Information Science, 41, 391–407.
FREUND, Y. and SCHAPIRE, R.E. (1995): A Decision Theoretic Generalization of On-Line Learning and an Application to Boosting. In: Second European Conference on Computational Learning Theory (EuroCOLT-95).
GREEN, S.J. (1999): Building Hypertext Links By Computing Semantic Similarity. IEEE Transactions on Knowledge and Data Engineering, 11, 713–730.
HARRIS, Z. (1968): Mathematical Structures of Language. Wiley, New York, US.
HEARST, M.A. (1992): Automatic Acquisition of Hyponyms from Large Text Corpora. In: Proceedings of the 14th International Conference on Computational Linguistics (COLING).
HERSH, W. R.; BUCKLEY, C.; LEONE, T.J. and HICKAM, D.H. (1994): OHSUMED: An Interactive Retrieval Evaluation and new large Test Collection for Research. In: Proceedings of ACM SIGIR, 1994. ACM Press.
HINDLE, D. (1990): Noun Classification from Predicate-Argument Structures. In: Proceedings of the Annual Meeting of the ACL.
HOTHO, A.; STAAB, S. and STUMME, G. (2003): Ontologies Improve Text Document Clustering. In: Proceedings of ICDM, 2003. IEEE Computer Society.
JAIN, A. K., MURTY, M. N., and FLYNN, P. J. (1999): Data Clustering: A review. ACM Computing Surveys, 31, 264–323.
MAEDCHE, A. and STAAB, S. (2001): Ontology Learning for the Semantic Web. IEEE Intelligent Systems, 16, 72–79.
REINBERGER, M.-L. and SPYNS, P. (2005): Unsupervised Text Mining for the Learning of DOGMA-inspired Ontologies. In: Ontology Learning from Text: Methods, Evaluation and Applications. IOS Press. To appear.
SALTON, G. and MCGILL, M.J. (1983): Introduction to Modern Information Retrieval. McGraw-Hill, New York, NY, US.
SCOTT, S. and MATWIN, S. (1999): Feature Engineering for Text Classification. In: Proceedings of ICML, 1999. Morgan Kaufmann. 379–388.
SEBASTIANI, F. (2002): Machine Learning in Automated Text Categorization. ACM Computing Surveys, 34, 1–47
STEINBACH, M., KARYPIS, G., and KUMAR, V. (2000): A Comparison of Document Clustering Techniques. In: KDD Workshop on Text Mining 2000.
WANG, B.; MCKAY, R.I.; ABBASS, H.A. and BARLOW, M. (2003): A Comparative Study for Domain Ontology Guided Feature Extraction. In: Proceedings of ACSC-2003. Australian Computer Society.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer Berlin · Heidelberg
About this paper
Cite this paper
Bloehdorn, S., Cimiano, P., Hotho, A. (2006). Learning Ontologies to Improve Text Clustering and Classification. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds) From Data and Information Analysis to Knowledge Engineering. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-31314-1_40
Download citation
DOI: https://doi.org/10.1007/3-540-31314-1_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31313-7
Online ISBN: 978-3-540-31314-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)