Semantic Distance Measures with Distributional Profiles of Coarse-Grained Concepts

Hirst, Graeme; Mohammad, Saif

doi:10.1007/978-3-642-22613-7_4

Graeme Hirst⁷ &
Saif Mohammad⁸

Part of the book series: Studies in Computational Intelligence ((SCI,volume 370))

898 Accesses
3 Citations

Abstract

Although semantic distance measures are applied to words in textual tasks such as building lexical chains, semantic distance is really a property of concepts, not words. After discussing the limitations of measures based solely on lexical resources such as WordNet or solely on distributional data from text corpora, we present a hybrid measure of semantic distance based on distributional profiles of concepts that we infer from corpora. We use only a very coarse-grained inventory of concepts—each category of a published thesaurus is taken as a single concept—and yet we obtain results on basic semantic-distance tasks that are better than those of methods that use only distributional data and are generally as good as those that use fine-grained WordNet-based measures. Because the measure is based on naturally occurring text, it is able to find word pairs that stand in non-classical relationships not found in WordNet. It can be applied cross-lingually, using a thesaurus in one language to measure semantic distance between words in another. In addition, we show the use of the method in determining the degree of antonymy of word pairs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Banerjee, S., Pedersen, T.: Extended gloss overlaps as a measure of semantic relatedness. In: Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, pp. 805–810 (2003)
Google Scholar
Beigman Klebanov, B.: Semantic relatedness: Computational investigation of human data. In: Proceedings of the 3rd Midwest Computational Linguistics Colloquium, Urbana-Champaign, USA (2006)
Google Scholar
Bernard, J. (ed.): The Macquarie Thesaurus. Macquarie Library, Sydney, Australia (1986)
Google Scholar
Budanitsky, A., Hirst, G.: Evaluating WordNet-based measures of semantic distance. Computational Linguistics 32(1), 13–47 (2006)
Article Google Scholar
Charles, W.G., Miller, G.A.: Contexts of antonymous adjectives. Applied Psychology 10, 357–375 (1989)
Article Google Scholar
Dagan, I.: Contextual word similarity. In: Dale, R., Moisl, H., Somers, H. (eds.) Handbook of Natural Language Processing, pp. 459–475. Marcel Dekker Inc., New York (2000)
Google Scholar
Gurevych, I.: Using the structure of a conceptual network in computing semantic relatedness. In: Proceedings of the 2nd International Joint Conference on Natural Language Processing, Jeju Island, Republic of Korea, pp. 767–778 (2005)
Google Scholar
Hirst, G., Budanitsky, A.: Correcting real-word spelling errors by restoring lexical cohesion. Natural Language Engineering 11, 87–111 (2005)
Article Google Scholar
Hirst, G., St-Onge, D.: Lexical chains as representations of context for the detection and correction of malapropisms. In: Fellbaum, C. (ed.) WordNet: An Electronic Lexical Database, ch. 13, pp. 305–332. The MIT Press, Cambridge (1998)
Google Scholar
Jarmasz, M., Szpakowicz, S.: Roget’s Thesaurus and semantic similarity. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2003), pp. 212–219 (2003)
Google Scholar
Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of International Conference on Research in Computational Linguistics (ROCLING X), Taiwan, pp. 19–33 (1997)
Google Scholar
Justeson, J.S., Katz, S.M.: Cooccurrences of antonymous adjectives and their contexts. Computational Linguistics 17, 1–19 (1991)
Google Scholar
Leacock, C., Chodorow, M.: Combining local context and WordNet similarity for word sense identification. In: Fellbaum, C. (ed.) WordNet: An Electronic Lexical Database, ch. 11, pp. 265–283. The MIT Press, Cambridge (1998)
Google Scholar
Li, J., Hirst, G.: Semantic knowledge in a word completion task. In: Proceedings, 7th International ACM SIGACCESS Conference on Computers and Accessibility, Baltimore, MD (2005)
Google Scholar
Lin, D.: Automatic retrieval and clustering of similar words. In: Proceedings of the 36th annual meeting of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics (COLING- ACL 1998), pp. 768–774 (1998)
Google Scholar
Lin, D.: An information-theoretic definition of similarity. In: Proceedings of the 15th International Conference on Machine Learning, pp. 296–304 (1998)
Google Scholar
Miller, G.A., Charles, W.G.: Contextual correlates of semantic similarity. Language and Cognitive Processes 6(1), 1–28 (1991)
Article Google Scholar
Mohammad, S.: Measuring semantic distance using distributional profiles of concepts. PhD thesis, Department of Computer Science, University of Toronto (2008)
Google Scholar
Mohammad, S., Hirst, G.: Distributional measures as proxies for semantic relatedness (2005), http://ftp.cs.toronto.edu/pub/gh/Mohammad+Hirst-2005.pdf
Mohammad, S., Hirst, G.: Determining word sense dominance using a thesaurus. In: Proceedings of the 11th conference of the European chapter of the Association for Computational Linguistics (EACL 2006), Trento, Italy, pp. 121–128 (2006)
Google Scholar
Mohammad, S., Hirst, G.: Distributional measures of concept-distance: A task-oriented evaluation. In: Proceedings, 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006), Sydney, Australia (2006)
Google Scholar
Mohammad, S., Gurevych, I., Hirst, G., Zesch, T.: Cross-lingual distributional profiles of concepts for measuring semantic distance. In: 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2007), Prague (2007)
Google Scholar
Mohammad, S., Hirst, G., Resnik, P.: TOR, TORMD: Distributional profiles of concepts for unsupervised word sense disambiguation. In: SemEval-2007: 4th International Workshop on Semantic Evaluations, Prague (2007)
Google Scholar
Mohammad, S., Dorr, B., Hirst, G.: Computing word-pair antonymy. In: 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP 2008), Waikiki, Hawaii (2008)
Google Scholar
Morris, J., Hirst, G.: Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics 17(1), 21–48 (1991)
Google Scholar
Morris, J., Hirst, G.: Non-classical lexical semantic relations. In: Workshop on Computational Lexical Semantics, Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Boston, MA (2004): reprinted in: Hanks, P.(editor), Lexicology: Critical Concepts in Linguistics, Routledge (2007)
Google Scholar
Patwardhan, S., Banerjee, S., Pedersen, T.: Using measures of semantic relatedness for word sense disambiguation. In: Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics, pp. 241–257 (2003)
Google Scholar
Resnik, P.: Using information content to evaluate semantic similarity. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, Montreal, Canada, pp. 448–453 (1995)
Google Scholar
Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Communications of the ACM 8(10), 627–633 (1965)
Article Google Scholar
Weeds, J.E.: Measures and applications of lexical distributional similarity. PhD thesis, University of Sussex (2003)
Google Scholar
Wu, Z., Palmer, M.: Verb semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico, pp. 133–138 (1994)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Toronto, Toronto, Ontario, Canada, M5S 3G4
Graeme Hirst
Institute for Information Technology, National Research Council Canada, Ottawa, Ontario, Canada, K1A 0R6
Saif Mohammad

Authors

Graeme Hirst
View author publications
You can also search for this author in PubMed Google Scholar
Saif Mohammad
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Linguistics and Literature, Bielefeld University, Universitätsstraße 25, 33615, Bielefeld, Germany
Alexander Mehler
Institute of Cognitive Science, University of Osnabrück, Albrechtstr. 28, 49076, Osnabrück, Germany
Kai-Uwe Kühnberger
Angewandte Sprachwissenschaft und, Justus-Liebig-Universität Gießen, Computerlinguistik, Otto-Behaghel-Straße 10D, 35394, Gießen, Germany
Henning Lobin & Harald Lüngen &
Institut für deutsche Sprache und Literatur, Technical University Dortmund, Emil-Figge-Straße 50, 44227, Dortmund, Germany
Angelika Storrer
SFB 441 Linguistic Data Structures, Eberhard Karls Universität Tübingen, Nauklerstraße 35, 72074, Tübingen, Germany
Andreas Witt

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hirst, G., Mohammad, S. (2011). Semantic Distance Measures with Distributional Profiles of Coarse-Grained Concepts. In: Mehler, A., Kühnberger, KU., Lobin, H., Lüngen, H., Storrer, A., Witt, A. (eds) Modeling, Learning, and Processing of Text Technological Data Structures. Studies in Computational Intelligence, vol 370. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22613-7_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-22613-7_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22612-0
Online ISBN: 978-3-642-22613-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics