Skip to main content
Log in

Word sense disambiguation methods

  • Published:
Programming and Computer Software Aims and scope Submit manuscript

Abstract

Word sense disambiguation is one of the key tasks of text processing. It consists in the determination of senses of words or compound terms in accordance with the context where they were used. The word sense disambiguation problem originated in the 1950s as a subtask of machine translation. Since then, the great number of methods of its solution has been developed; however, none of them may be viewed as a perfect one. The paper is a survey of most well-known studies in this field.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., and Miller, K., WordNet: An On-line Lexical Data-base, Int. J. Lexicography, 1990, vol. 3, pp. 235–244.

    Article  Google Scholar 

  2. Word Sense Disambiguation: Algorithms and Applications (Text, Speech and Language Technology), Aggire, E. and Edmonds, P.G., Eds., Springer, 2007.

  3. Ide, N. and Ve’ronis, J., Word Sense Disambiguation: The State of the Art, Computational Linguistics, 1998.

  4. Salton, G., Automatic Information Organization and Retrieval, McGraw Hill Text, 1968.

  5. Litowski, K.C. Desiderata for Tagging with Word-Net Synsets or MCAA Categories, Proc. of the ACL-SIGLEX Workshop, “Tagging Text with Lexical Semantics: Why, What, and How?” Washington, DC, 1997, pp. 12–17.

  6. Seneff, S., TINA: A Natural Language System for Spoken Language Applications, Comput. Linguist., 1992, vol. 18, no. 1, pp. 61–86.

    Google Scholar 

  7. Grineva, M., Grinev, M., Turdakov, D., Velikhov, P., and Boldakov, A., Harnessing Wikipedia for Smart Tags Clustering, KASW: Int. Workshop on Knowledge Acquisition from the Social Web, 2008.

  8. Yarowsky, D., Decision Lists for Lexical Ambiguity Resolution: Application to Accent Restoration in Spanish and French, Proc. of the 32nd Ann. Meeting of Association for Computational Linguistics, Morristown, NJ, USA: Association for Computational Linguistics, 1994, pp. 88–95.

    Chapter  Google Scholar 

  9. Grineva, M., Grinev, M., and Lizorkin, D., Effective Extraction of Thematically Grouped Key Terms from Text, AAAI-SSS-09. Social Semantic Web: Where Web 2.0 Meets Web 3.0., 2009.

  10. Grineva, M., Grinev, M., and Lizorkin, D., Extracting Key Terms From Noisy and Multi-theme Documents, The 18th Int. World Wide Web Conf., 2009, pp. 661–661.

  11. Aristotel, Categories (Collected works in four volumes), Moscow: Mysl’, 1978–1984.

    Google Scholar 

  12. Rozental’, D.E., Golub, I.B., and Telenkova, M.A., Sovremennyi russkii yazyk (Contemporary Russian), Airis, 2007.

  13. Gimenez, J. and Marquez, L., SVMTool: A General POS Tagger Generator Based on Support Vector Machines, 2004.

  14. Malouf, R., A Comparison of Algorithms for Maximum Entropy Parameter Estimation, COLING-02: Proc. of the 6th Conf. on Natural Language Learning, Morristown, NJ, USA: Association for Computational Linguistics, 2002.

    Google Scholar 

  15. Schank, R.C., Conceptual Information Processing, Amsterdam: North Holland, 1975.

    MATH  Google Scholar 

  16. Vinogradov, V.V., Main Types of Word Lexical Meanings, in Voprosy yazykoznaniya (Linguistics Issues), 1953.

  17. Kaplan, A., An Experimental Study of Ambiguity and Context, Mechanical Translation, 1955, vol. 2, no. 2, pp. 39–46.

    Google Scholar 

  18. Yarowsky, D., One Sense per Collocation, HLT’93: Proc. of the Workshop on Human Language Technology, Morristown, NJ, USA: Association for Computational Linguistics, 1993, pp. 266–271.

    Chapter  Google Scholar 

  19. Gale, W.A., Church, K.W., and Yarowsky, D., A Method for Disambiguating Word Senses in a Large Corpus, Comput. Humanities, 1993, vol. 26, pp. 415–439.

    Article  Google Scholar 

  20. Gale, W.A., Church, K.W., and Yarowsky, D., One Sense per Discourse, HLT’91: Proc. of the Workshop on Speech and Natural Language, Morristown, NJ, USA: Association for Computational Linguistics, 1992, pp. 233–237.

    Chapter  Google Scholar 

  21. Richmond, K., Smith, A., and Amitay, E., Detecting Subject Boundaries Within Text: A Language Independent Statistical Approach, Proc. of the Second Conf. on Empirical Methods in Natural Language Processing, EMNLP-2, Providence, RI: Brown University, 1997, pp. 47–54.

    Google Scholar 

  22. Winograd, T., Procedures as a Representation for Data in a Computer Program for Understanding Natural Language, Tech. Rep. MAC-TR-84. MIT Project MAC, 1971.

  23. Miller, G.A., Leacock, C., Tengi, R., and Bunker, R.T., A Semantic Concordance, HLT’93: Proc. of the Workshop on Human Language Technology, Morristown, NJ, USA: Association for Computational Linguistics, 1993, pp. 303–308.

    Chapter  Google Scholar 

  24. Nelson, W. F. and Kučera, H., Frequency Analysis of English Usage: Lexicon and Grammar, J. English Linguistics, 1982, vol. 18, no. 1, pp. 64–70.

    Google Scholar 

  25. Leacock, C., Towell, G., and Voorhees, E., Corpusbased Statistical Sense Resolution, HLT’93: Proc. of the Workshop on Human Language Technology, Morristown, NJ, USA: Association for Computational Linguistics, 1993, pp. 260–265.

    Chapter  Google Scholar 

  26. Bruce, R.R. and Wiebe, J., Word-Sense Disambiguation Using Decomposable Models, Proc. of the 32nd Ann. Meeting of the Association for Computational Linguistics, 1994, pp. 139–146.

  27. Hwee, Tou Ng and Hian, Beng Lee., Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach, Proc. of the Thirty-Fourth Ann. Meeting of the Association for Computational Linguistics, Joshi, A. and Palmer, M., Eds., San Francisco: Morgan Kaufmann, 1996, pp. 40–47.

    Google Scholar 

  28. Kilgarriff, A., SENSEVAL: An Exercise in Evaluating Word Sense Disambiguation Programs, LREC, 1998, pp. 581–588.

  29. Atkins, S., Tools for Computer-aided Corpus Lexicography: The Hector Project, Acta Linguistica Hungarica, 1993, vol. 41, pp. 5–72.

    Google Scholar 

  30. Palmer, M., Fellbaum, C., Cotton, S., Delfs, L., and Hoa Trang Dang, English Tasks: All-Words and Verb Lexical Sample, Proc. of Senseval-2: The Second Int. Workshop on Evaluating Word Sense Disambiguation Systems, Toulouse, France, 2001, pp. 21–24.

  31. Mihalcea, R. and Edmonds, P., Proc. of Senseval-3: The Third Int. Workshop on the Evaluation of Systems for the Semantic Analysis of Text, Barcelona, Spain, 2004.

  32. Chklovski, T. and Mihalcea, R., Building a Sense Tagged Corpus with Open Mind Word Expert, Proc. of the ACL-02 Workshop on Word Sense Disambiguation, Morristown, NJ, USA: Association for Computational Linguistics, 2002, pp. 116–122.

    Chapter  Google Scholar 

  33. Guha, R.V. and Lenat, D.B., CYC: A Mid-term Report, Appl. Artif. Intell., 1991, vol. 5, no. 1, pp. 45–86.

    Article  Google Scholar 

  34. Marcus, M.P., Marcinkiewicz, M.A., and Santorini, B., Building a Large Annotated Corpus of English: The Penn Treebank, 2004.

  35. Kilgarriff, A. and Grefenstette, G., Introduction to the Special Issue on the Web as Corpus, Computational Linguistics, 2003, vol. 29, pp. 333–347.

    Article  MathSciNet  Google Scholar 

  36. Chomsky, N., Syntactic Structures, The Hague: Mouton, 1957.

    Google Scholar 

  37. Minsky, M., A Framework for Representing Knowledge, MIT-AI Lab. Memo 306,1974.

  38. Richens, R.H., Interlingual Machine Translation, Computer J., 1958, vol. 3, no. 1, pp. 144–147.

    Article  Google Scholar 

  39. Masterman, M., Semantic Message Detection for Machine Translation, Using an Interlingua, Int. Conf. on Machine Translation of Languages and Applied Language Analysis, London: Her Majesty’s Stationery Office, 1962, pp. 437–475.

    Google Scholar 

  40. Quillian, M.R., The Teachable Language Comprehender: A Simulation Program and Theory of Language, Commun. ACM, 1969, vol. 12, no. 8, pp. 459–476.

    Article  Google Scholar 

  41. Hayes, P.J., A Process to Implement Some Word-Sense Disambiguation, Working Paper 23, Institut pour les Etudes Sémantiques et Cognitives, Université de Genéve, 1976.

  42. Collins, A.M. and Loftus, E.F., A Spreading Activation Theory of Semantic Processing, Psychological Review, 1975, vol. 82, no. 6, pp. 407–428.

    Article  Google Scholar 

  43. Lesk, M., Automatic Sense Disambiguation Using Machine Readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone, ACM Special Interest Group for Design of Communication, Proc. of the 5th Ann. Int. Conf. on System Documentation, 1986, pp. 24–26.

  44. Leacock, C., Miller, G.A., and Chodorow, M., Using Corpus Statistics and WordNet Relations for Sense Identification, 1998.

  45. Hirst, G. and St-Onge, D., Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms, 1997.

  46. Resnik, P., Using Information Content to Evaluate Semantic Similarity in a Taxonomy, Proc. of the 14th Int. Joint Conf. on Artificial Intelligence, 1995, pp. 448–453.

  47. Jiang, J.J. and Conrath, D.W., Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy, Int. Conf. Research on Computational Linguistics (ROCLING X), 1997.

  48. Lin, D., An Information-Theoretic Definition of Similarity, ICML’98: Proc. of the Fifteenth Int. Conf. on Machine Learning, San Francisco: Morgan Kaufmann, 1998, pp. 296–304.

    Google Scholar 

  49. Mihalcea, R. and Moldovan, D.I., A Method for Word Sense Disambiguation of Unrestricted Text, Proc. of the 37th Ann. Meeting of the Association for Computational Linguistics on Computational Linguistics, Morristown, NJ, USA: Association for Computational Linguistics, 1999, pp. 152–158.

    Chapter  Google Scholar 

  50. Agirre, E. and Rigau, G., Word Sense Disambiguation Using Conceptual Density, Proc. of the 16th Int. Conf. on Computational Linguistics, 1996, pp. 16–22.

  51. Stetina, J., Kurohashi, S., and Nagao, M., General Word Sense Disambiguation Method Based on a Full Sentential Context. Usage of WordNet in Natural Language Processing, Proc. of COLING-ACL Workshop, 1998.

  52. Morris, J. and Hirst, G., Lexical Cohesion Computed by Thesaural Relations as an Indicator of the Structure of Text, Comput. Linguist., 1991, vol. 17, no. 1, pp. 21–48.

    Google Scholar 

  53. Mihalcea, R. and Moldovan, D.I., A Highly Accurate Bootstrapping Algorithm for Word Sense Disambiguation, Int. J. Artificial Intelligence Tools, 2001, vol. 10, no. 1–2, pp. 5–21.

    Article  Google Scholar 

  54. Turdakov, D. and Lizorkin, D., HMM Expanded to Multiple Interleaved Chains as a Model for Word Sense Disambiguation, Proc. of the 23rd Pacific Asia Conf. on Language, Information and Computation, Hong Kong: City University of Hong Kong, 2009, pp. 549–558.

    Google Scholar 

  55. Mihalcea, R., Unsupervised Large-vocabulary Word Sense Disambiguation with Graph-based Algorithms for Sequence Data Labeling, HLT’05: Proc. of the Conf. on Human Language Technology and Empirical Methods in Natural Language Processing, Morristown, NJ, USA: Association for Computational Linguistics, 2005, pp. 411–418.

    Chapter  Google Scholar 

  56. Brin, S. and Page, L., The Anatomy of a Large-Scale Hypertextual Web Search Engine, Computer Networks and ISDN Systems, 1998, pp. 107–117.

  57. Nelken, R. and Shieber, S.M., Lexical Chaining and Word-Sense Disambiguation, Tech. Report TR-06-07, School of Engineering and Applied Sciences, Harvard University, 2007.

  58. Brockmann, C. and Lapata, M., Evaluating and Combining Approaches to Selectional Preference Acquisition, EACL’03: Proc. of the Tenth Conf. on European Chapter of the Association for Computational Linguistics, Morristown, NJ, USA: Association for Computational Linguistics, 2003, pp. 27–34.

    Chapter  Google Scholar 

  59. Lukashevich, N.V. and Dobrov, B.V., Russian-Language Thesaurus for Automatic Processing Large Text Collections, Komp’yutornaya lingvistika i intellektual’nye tekhnologii (Computer Linguistics and Intelligence Technologies), Narin’yani, A.S., Ed., Moscow: Nauka, 2002.

    Google Scholar 

  60. Dobrov, B.V. and Lukashevich, N.V., Ontologies for Automatic Text Processing: Description of Concepts and Lexical Meanings, Komp’yutornaya lingvistika i intellektual’nye tekhnologii (Computer Linguistics and Intelligence Technologies), Laufer, N.I., Narin’yani, A.S., and Selegei, V.P., Eds., Moscow: RGGU, 2006, pp. 138–142.

    Google Scholar 

  61. Dobrov, B.V. and Lukashevich, N.V., Word Sense Disambiguation Based on Thesaurus and Subject Domain, Trudy mezhdunarodnoi konferentsii “Dialog 2007” (Proc. of Int. Conf. “Dialog 2007”), 2007.

  62. Lukashevich, N.V. and Chuiko, D.S., Thesaurus-based Automatic Word Sense Disambiguation, Sbornik rabot uchastnikov konkursa “Internet-matematika 2007” (Proc. of Competition “Internet Mathematics 2007”), 2007.

  63. Xiaohua, Zhou and Hyoil, Han., Survey of Word Sense Disambiguation Approaches, Proc. of the 18th Int. Florida AI Research Society Conf.

  64. Chodorow, M., Leacock, C., and Miller, G.A., A Topical Local Classifier for Word Sense Identification, Comput. Humanities, 2000, vol. 34, pp. 115–120.

    Article  Google Scholar 

  65. Berger, A.L., Della Pietra, V.J., and Della Pietra, S.A., A Maximum Entropy Approach to Natural Language Processing, Comput. Linguist., 1996, vol. 22, no. 1, pp. 39–71.

    Google Scholar 

  66. Fellbaum, C. and Palmer, M., Manual and Automatic Semantic Annotation with WordNet, Proc. of NAACL 2001 Workshop, 2001.

  67. O’Hara, T. et al., Selecting Decomposable Models for Word Sense Disambiguation: The Grling-sdm System, Comput. Humanities, 2000, vol. 34, pp. 159–164.

    Article  Google Scholar 

  68. Bruce, R.F. and Wiebe, J.M., Decomposable Modeling in Natural Language Processing, Comput. Linguist., 1999, vol. 25, no. 2, pp. 195–207.

    Google Scholar 

  69. Daelemans, W., Zavrel, J., van der Sloot, K., and van den Bosch, A., TiMBL: Tilburg Memory-Based Learner, Version 4.0. Reference Guide, 2001.

  70. Stevenson, M. and Wilks, Y., The Interaction of Knowledge Sources in Word Sense Disambiguation, Comput. Linguist., 2001, vol. 27, no. 3, pp. 321–349.

    Article  Google Scholar 

  71. Hoa Trang Dang and Palmer, M., Combining Contextual Features for Word Sense Disambiguation, Proc. of the Workshop on Word Sense Disambiguation: Recent Successes and Future Directions, 2002, pp. 88–94.

  72. Bhattacharya, I., Getoor, L., and Bengio, Y., Unsupervised Sense Disambiguation Using Bilingual Probabilistic Models, ACL’04: Proc. of the 42nd Ann. Meeting of Association for Computational Linguistics, Morristown, NJ, USA: Association for Computational Linguistics, 2004, p. 287.

    Chapter  Google Scholar 

  73. De Loupy, C., El-Beze, M., and Marteau, P.F., Word Sense Disambiguation Using HMM Tagger, The 1st Int. Conf. on Language Resources and Evaluation (LREC), 1998, pp. 1255–1258.

  74. Ferran, A.M., Molina, A., Pla F., Segarra, E., and Moreno, L., Word Sense Disambiguation Using Statistical Models and WordNet, Proc. of the 3rd Int. Conf. on Language Resources and Evaluation, LREC2002, Las Palmas de Gran Canaria, 2002.

  75. Molina, A., Pla, F., and Segarra, E., WSD System Based on Specialized Hidden Markov Model (upvshmm-eaw), Senseval-3: The Third Int. Workshop on the Evaluation of Systems for the Semantic Analysis of Text, Mihalcea R. and Edmonds, P., Eds., Barcelona, Spain: Association for Computational Linguistics, 2004, pp. 171–174.

    Google Scholar 

  76. Plungyan, V.A., Reznikova T.I., and Sichinava, D.V., National Russian-Language Corpus: General Characteristic, Nauchno-teknicheskaya informatsiya, 2005, Ser. 2, no. 2.

  77. Kobritsov, B.P., Word Sense Disambiguation Methods, Nauchno-teknicheskaya informatsiya, 2004, Ser. 2, no. 3, pp. 9–13.

  78. Kobritsov, B.P. and Lyashevskaya, O.N., Automatic Word Sense Disambiguation in National Russian-Language Corpus, Komp’yutornaya lingvistika i intellektual’nye tekhnologii (Computer Linguistics and Intelligence Technologies), Moscow: Nauka, 2004.

    Google Scholar 

  79. Kobritsov, B.P., Lyashevskaya, O.N., and Shemanaeva, O.Yu., Surface Filters for Disambiguation of Semantic Homonymy in a Text Corpus, Komp’yutornaya lingvistika i intellektual’nye tekhnologii (Computer Linguistics and Intelligence Technologies), Kobozeva, I.M., Narin’yani, A.S., and Selegei, V.P., Eds., Moscow: Nauka, 2005.

    Google Scholar 

  80. Kobritsov, B.P., Lyashevskaya, O.N., and Shemanaeva, O.Yu., Disambiguation of Lexical-Semantic Homonymy in News and Newspaper and Magazine Texts, in Internet-matematika (Internet Mathematics), Moscow, 2005.

  81. Kobritsov, B.P., Lyashevskaya, O.N., and Toldova, S.Yu., Verb Sense Disambiguation with the Use of Inflection Models Extracted from Digital Explanatory Dictionaries, Digital publication, http://download.yandex.ru/IMAT2007/kobricov.pdf.2007.

  82. Shemanaeva, O.Yu., Kustova, G.I., Lyashevskaya, O.N., and Rakhilina, E.V., Semantic Filters for Word Sense Diambiguation in National Russian-Language Corpus: Adjectives, Komp’yutornaya lingvistika i intellektual’nye tekhnologii (Computer Linguistics and Intelligence Technologies), 2006, pp. 138–142.

  83. Zlatic, V., Bozicevic, M., Stefancic, H., and Domazet, M., Wikipedias: Collaborative Web-based Encyclopedias as Complex Networks, Physical Review E., 2006, vol. 74, pp. 16–115.

    Article  Google Scholar 

  84. Strube, M. and Ponzetto, S.P., WikiRelate! Computing Semantic Relatedness Using Wikipedia, Proc. of AAAI, 2006, pp. 1419–1424.

  85. Gabrilovich, E. and Markovitch, S., Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis, Proc. of the 20th Int. Joint Conf. on Artificial Intelligence, 2007, pp. 6–12.

  86. Milne, D., Computing Semantic Relatedness Using Wikipedia Link Structure, Proc. of the New Zealand Comput. Sci. Research Student Conf. (NZCSRSC), Hamilton, New Zealand, 2007.

  87. Milne, D. and Witten, I.H., An Effective, Low-Cost Measure of Semantic Relatedness Obtained from Wikipedia Links, Proc. of the AAAI’08 Workshop on Wikipedia and Artificial Intelligence, 2008.

  88. Yeh, E., Ramage, D., Manning, C.D., Agirre, E., and Soroa, A., WikiWalk: Random Walks on Wikipedia for Semantic Relatedness, Proc. of the 2009 Workshop on Graph-based Methods for Natural Language Processing (TextGraphs-4), Suntec, Singapore: Association for Computational Linguistics, 2009, pp. 41–49.

    Chapter  Google Scholar 

  89. Turdakov, D. and Velikhov, P., Semantic Relatedness Metric for Wikipedia Concepts Based on Link Analysis and its Applications to Word Sense Disambiguation, Proc. of SYRCoDIS, 2008.

  90. Lizorkin, D., Velikhov, P., Grinev, M., and Turdakov, D., Accuracy Estimate and Optimization Techniques for SimRank Computation, The VLDB J., 2009. http://dx.doi.org/10.1145/1453856.1453904.

  91. Zesch, T. and Gurevych, I., Analysis of the Wikipedia Category Graph for NLP Applications, Proc. of the TextGraphs-2 Workshop, NAACL-HLT, 2007.

  92. Giles, J., Internet Encyclopedias Go Head to Head, Nature, 2005, vol. 438, pp. 900–901.

    Article  Google Scholar 

  93. Mihalcea, R., Using Wikipedia for Automatic Word Sense Disambiguation, Proc. of NAACL HLT 2007, Rochester, NY, 2007, pp. 196–203.

  94. Mihalcea, R. and Csomai, A., Wikify!: Linking Documents to Encyclopedic Knowledge, Proc. of the 16th ACM Conf. on Information and Knowledge Management (CIKM’07), 2007.

  95. Cucerzan, S., Large-Scale Named Entity Disambiguation Based on Wikipedia Data, Proc. of Conf. on Empirical Methods in Natural Language Processing (EMNLP 2007), Prague, 2007, pp. 708–716.

  96. Bunescu, R. and Pasca, M., Using Encyclopedic Knowledge for Named Entity Disambiguation, Proc. of the 11th Conf. of the European Chapter of the Association for Computational Linguistics (EACL), Trento, Italy, 2006.

  97. Medelyan, O., Witten, I.H., and Milne, D., Topic Indexing with Wikipedia, Proc. of the 1st AAAI’08 Workshop on Wikipedia and Artificial Intelligence, 2008.

  98. Milne, D. and Witten, I.H., Learning to Link with Wikipedia, Proc. of the 17th ACM Conf. on Information and Knowledge Management, 2008, pp. 509–518.

  99. Turdakov, D.Yu., Disambiguation of Wikipedia Terms Based on Hidden Markov Model, XI Vserossiiskaya nauchnaya konferentsiya “Elektronnye biblioteki: perspektivnye metody i tekhnologii, elektronnye kollektsii (XI All-Russian Scientific Conf. “Digital Libraries: Perspective Methods and Technologies, Digital Collections”)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to D. Yu. Turdakov.

Additional information

Original Russian Text © D.Yu. Turdakov, 2010, published in Programmirovanie, 2010, Vol. 36, No. 6.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Turdakov, D.Y. Word sense disambiguation methods. Program Comput Soft 36, 309–326 (2010). https://doi.org/10.1134/S0361768810060010

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S0361768810060010

Keywords

Navigation