Abstract
Word sense disambiguation is one of the key tasks of text processing. It consists in the determination of senses of words or compound terms in accordance with the context where they were used. The word sense disambiguation problem originated in the 1950s as a subtask of machine translation. Since then, the great number of methods of its solution has been developed; however, none of them may be viewed as a perfect one. The paper is a survey of most well-known studies in this field.
Similar content being viewed by others
References
Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., and Miller, K., WordNet: An On-line Lexical Data-base, Int. J. Lexicography, 1990, vol. 3, pp. 235–244.
Word Sense Disambiguation: Algorithms and Applications (Text, Speech and Language Technology), Aggire, E. and Edmonds, P.G., Eds., Springer, 2007.
Ide, N. and Ve’ronis, J., Word Sense Disambiguation: The State of the Art, Computational Linguistics, 1998.
Salton, G., Automatic Information Organization and Retrieval, McGraw Hill Text, 1968.
Litowski, K.C. Desiderata for Tagging with Word-Net Synsets or MCAA Categories, Proc. of the ACL-SIGLEX Workshop, “Tagging Text with Lexical Semantics: Why, What, and How?” Washington, DC, 1997, pp. 12–17.
Seneff, S., TINA: A Natural Language System for Spoken Language Applications, Comput. Linguist., 1992, vol. 18, no. 1, pp. 61–86.
Grineva, M., Grinev, M., Turdakov, D., Velikhov, P., and Boldakov, A., Harnessing Wikipedia for Smart Tags Clustering, KASW: Int. Workshop on Knowledge Acquisition from the Social Web, 2008.
Yarowsky, D., Decision Lists for Lexical Ambiguity Resolution: Application to Accent Restoration in Spanish and French, Proc. of the 32nd Ann. Meeting of Association for Computational Linguistics, Morristown, NJ, USA: Association for Computational Linguistics, 1994, pp. 88–95.
Grineva, M., Grinev, M., and Lizorkin, D., Effective Extraction of Thematically Grouped Key Terms from Text, AAAI-SSS-09. Social Semantic Web: Where Web 2.0 Meets Web 3.0., 2009.
Grineva, M., Grinev, M., and Lizorkin, D., Extracting Key Terms From Noisy and Multi-theme Documents, The 18th Int. World Wide Web Conf., 2009, pp. 661–661.
Aristotel, Categories (Collected works in four volumes), Moscow: Mysl’, 1978–1984.
Rozental’, D.E., Golub, I.B., and Telenkova, M.A., Sovremennyi russkii yazyk (Contemporary Russian), Airis, 2007.
Gimenez, J. and Marquez, L., SVMTool: A General POS Tagger Generator Based on Support Vector Machines, 2004.
Malouf, R., A Comparison of Algorithms for Maximum Entropy Parameter Estimation, COLING-02: Proc. of the 6th Conf. on Natural Language Learning, Morristown, NJ, USA: Association for Computational Linguistics, 2002.
Schank, R.C., Conceptual Information Processing, Amsterdam: North Holland, 1975.
Vinogradov, V.V., Main Types of Word Lexical Meanings, in Voprosy yazykoznaniya (Linguistics Issues), 1953.
Kaplan, A., An Experimental Study of Ambiguity and Context, Mechanical Translation, 1955, vol. 2, no. 2, pp. 39–46.
Yarowsky, D., One Sense per Collocation, HLT’93: Proc. of the Workshop on Human Language Technology, Morristown, NJ, USA: Association for Computational Linguistics, 1993, pp. 266–271.
Gale, W.A., Church, K.W., and Yarowsky, D., A Method for Disambiguating Word Senses in a Large Corpus, Comput. Humanities, 1993, vol. 26, pp. 415–439.
Gale, W.A., Church, K.W., and Yarowsky, D., One Sense per Discourse, HLT’91: Proc. of the Workshop on Speech and Natural Language, Morristown, NJ, USA: Association for Computational Linguistics, 1992, pp. 233–237.
Richmond, K., Smith, A., and Amitay, E., Detecting Subject Boundaries Within Text: A Language Independent Statistical Approach, Proc. of the Second Conf. on Empirical Methods in Natural Language Processing, EMNLP-2, Providence, RI: Brown University, 1997, pp. 47–54.
Winograd, T., Procedures as a Representation for Data in a Computer Program for Understanding Natural Language, Tech. Rep. MAC-TR-84. MIT Project MAC, 1971.
Miller, G.A., Leacock, C., Tengi, R., and Bunker, R.T., A Semantic Concordance, HLT’93: Proc. of the Workshop on Human Language Technology, Morristown, NJ, USA: Association for Computational Linguistics, 1993, pp. 303–308.
Nelson, W. F. and Kučera, H., Frequency Analysis of English Usage: Lexicon and Grammar, J. English Linguistics, 1982, vol. 18, no. 1, pp. 64–70.
Leacock, C., Towell, G., and Voorhees, E., Corpusbased Statistical Sense Resolution, HLT’93: Proc. of the Workshop on Human Language Technology, Morristown, NJ, USA: Association for Computational Linguistics, 1993, pp. 260–265.
Bruce, R.R. and Wiebe, J., Word-Sense Disambiguation Using Decomposable Models, Proc. of the 32nd Ann. Meeting of the Association for Computational Linguistics, 1994, pp. 139–146.
Hwee, Tou Ng and Hian, Beng Lee., Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach, Proc. of the Thirty-Fourth Ann. Meeting of the Association for Computational Linguistics, Joshi, A. and Palmer, M., Eds., San Francisco: Morgan Kaufmann, 1996, pp. 40–47.
Kilgarriff, A., SENSEVAL: An Exercise in Evaluating Word Sense Disambiguation Programs, LREC, 1998, pp. 581–588.
Atkins, S., Tools for Computer-aided Corpus Lexicography: The Hector Project, Acta Linguistica Hungarica, 1993, vol. 41, pp. 5–72.
Palmer, M., Fellbaum, C., Cotton, S., Delfs, L., and Hoa Trang Dang, English Tasks: All-Words and Verb Lexical Sample, Proc. of Senseval-2: The Second Int. Workshop on Evaluating Word Sense Disambiguation Systems, Toulouse, France, 2001, pp. 21–24.
Mihalcea, R. and Edmonds, P., Proc. of Senseval-3: The Third Int. Workshop on the Evaluation of Systems for the Semantic Analysis of Text, Barcelona, Spain, 2004.
Chklovski, T. and Mihalcea, R., Building a Sense Tagged Corpus with Open Mind Word Expert, Proc. of the ACL-02 Workshop on Word Sense Disambiguation, Morristown, NJ, USA: Association for Computational Linguistics, 2002, pp. 116–122.
Guha, R.V. and Lenat, D.B., CYC: A Mid-term Report, Appl. Artif. Intell., 1991, vol. 5, no. 1, pp. 45–86.
Marcus, M.P., Marcinkiewicz, M.A., and Santorini, B., Building a Large Annotated Corpus of English: The Penn Treebank, 2004.
Kilgarriff, A. and Grefenstette, G., Introduction to the Special Issue on the Web as Corpus, Computational Linguistics, 2003, vol. 29, pp. 333–347.
Chomsky, N., Syntactic Structures, The Hague: Mouton, 1957.
Minsky, M., A Framework for Representing Knowledge, MIT-AI Lab. Memo 306,1974.
Richens, R.H., Interlingual Machine Translation, Computer J., 1958, vol. 3, no. 1, pp. 144–147.
Masterman, M., Semantic Message Detection for Machine Translation, Using an Interlingua, Int. Conf. on Machine Translation of Languages and Applied Language Analysis, London: Her Majesty’s Stationery Office, 1962, pp. 437–475.
Quillian, M.R., The Teachable Language Comprehender: A Simulation Program and Theory of Language, Commun. ACM, 1969, vol. 12, no. 8, pp. 459–476.
Hayes, P.J., A Process to Implement Some Word-Sense Disambiguation, Working Paper 23, Institut pour les Etudes Sémantiques et Cognitives, Université de Genéve, 1976.
Collins, A.M. and Loftus, E.F., A Spreading Activation Theory of Semantic Processing, Psychological Review, 1975, vol. 82, no. 6, pp. 407–428.
Lesk, M., Automatic Sense Disambiguation Using Machine Readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone, ACM Special Interest Group for Design of Communication, Proc. of the 5th Ann. Int. Conf. on System Documentation, 1986, pp. 24–26.
Leacock, C., Miller, G.A., and Chodorow, M., Using Corpus Statistics and WordNet Relations for Sense Identification, 1998.
Hirst, G. and St-Onge, D., Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms, 1997.
Resnik, P., Using Information Content to Evaluate Semantic Similarity in a Taxonomy, Proc. of the 14th Int. Joint Conf. on Artificial Intelligence, 1995, pp. 448–453.
Jiang, J.J. and Conrath, D.W., Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy, Int. Conf. Research on Computational Linguistics (ROCLING X), 1997.
Lin, D., An Information-Theoretic Definition of Similarity, ICML’98: Proc. of the Fifteenth Int. Conf. on Machine Learning, San Francisco: Morgan Kaufmann, 1998, pp. 296–304.
Mihalcea, R. and Moldovan, D.I., A Method for Word Sense Disambiguation of Unrestricted Text, Proc. of the 37th Ann. Meeting of the Association for Computational Linguistics on Computational Linguistics, Morristown, NJ, USA: Association for Computational Linguistics, 1999, pp. 152–158.
Agirre, E. and Rigau, G., Word Sense Disambiguation Using Conceptual Density, Proc. of the 16th Int. Conf. on Computational Linguistics, 1996, pp. 16–22.
Stetina, J., Kurohashi, S., and Nagao, M., General Word Sense Disambiguation Method Based on a Full Sentential Context. Usage of WordNet in Natural Language Processing, Proc. of COLING-ACL Workshop, 1998.
Morris, J. and Hirst, G., Lexical Cohesion Computed by Thesaural Relations as an Indicator of the Structure of Text, Comput. Linguist., 1991, vol. 17, no. 1, pp. 21–48.
Mihalcea, R. and Moldovan, D.I., A Highly Accurate Bootstrapping Algorithm for Word Sense Disambiguation, Int. J. Artificial Intelligence Tools, 2001, vol. 10, no. 1–2, pp. 5–21.
Turdakov, D. and Lizorkin, D., HMM Expanded to Multiple Interleaved Chains as a Model for Word Sense Disambiguation, Proc. of the 23rd Pacific Asia Conf. on Language, Information and Computation, Hong Kong: City University of Hong Kong, 2009, pp. 549–558.
Mihalcea, R., Unsupervised Large-vocabulary Word Sense Disambiguation with Graph-based Algorithms for Sequence Data Labeling, HLT’05: Proc. of the Conf. on Human Language Technology and Empirical Methods in Natural Language Processing, Morristown, NJ, USA: Association for Computational Linguistics, 2005, pp. 411–418.
Brin, S. and Page, L., The Anatomy of a Large-Scale Hypertextual Web Search Engine, Computer Networks and ISDN Systems, 1998, pp. 107–117.
Nelken, R. and Shieber, S.M., Lexical Chaining and Word-Sense Disambiguation, Tech. Report TR-06-07, School of Engineering and Applied Sciences, Harvard University, 2007.
Brockmann, C. and Lapata, M., Evaluating and Combining Approaches to Selectional Preference Acquisition, EACL’03: Proc. of the Tenth Conf. on European Chapter of the Association for Computational Linguistics, Morristown, NJ, USA: Association for Computational Linguistics, 2003, pp. 27–34.
Lukashevich, N.V. and Dobrov, B.V., Russian-Language Thesaurus for Automatic Processing Large Text Collections, Komp’yutornaya lingvistika i intellektual’nye tekhnologii (Computer Linguistics and Intelligence Technologies), Narin’yani, A.S., Ed., Moscow: Nauka, 2002.
Dobrov, B.V. and Lukashevich, N.V., Ontologies for Automatic Text Processing: Description of Concepts and Lexical Meanings, Komp’yutornaya lingvistika i intellektual’nye tekhnologii (Computer Linguistics and Intelligence Technologies), Laufer, N.I., Narin’yani, A.S., and Selegei, V.P., Eds., Moscow: RGGU, 2006, pp. 138–142.
Dobrov, B.V. and Lukashevich, N.V., Word Sense Disambiguation Based on Thesaurus and Subject Domain, Trudy mezhdunarodnoi konferentsii “Dialog 2007” (Proc. of Int. Conf. “Dialog 2007”), 2007.
Lukashevich, N.V. and Chuiko, D.S., Thesaurus-based Automatic Word Sense Disambiguation, Sbornik rabot uchastnikov konkursa “Internet-matematika 2007” (Proc. of Competition “Internet Mathematics 2007”), 2007.
Xiaohua, Zhou and Hyoil, Han., Survey of Word Sense Disambiguation Approaches, Proc. of the 18th Int. Florida AI Research Society Conf.
Chodorow, M., Leacock, C., and Miller, G.A., A Topical Local Classifier for Word Sense Identification, Comput. Humanities, 2000, vol. 34, pp. 115–120.
Berger, A.L., Della Pietra, V.J., and Della Pietra, S.A., A Maximum Entropy Approach to Natural Language Processing, Comput. Linguist., 1996, vol. 22, no. 1, pp. 39–71.
Fellbaum, C. and Palmer, M., Manual and Automatic Semantic Annotation with WordNet, Proc. of NAACL 2001 Workshop, 2001.
O’Hara, T. et al., Selecting Decomposable Models for Word Sense Disambiguation: The Grling-sdm System, Comput. Humanities, 2000, vol. 34, pp. 159–164.
Bruce, R.F. and Wiebe, J.M., Decomposable Modeling in Natural Language Processing, Comput. Linguist., 1999, vol. 25, no. 2, pp. 195–207.
Daelemans, W., Zavrel, J., van der Sloot, K., and van den Bosch, A., TiMBL: Tilburg Memory-Based Learner, Version 4.0. Reference Guide, 2001.
Stevenson, M. and Wilks, Y., The Interaction of Knowledge Sources in Word Sense Disambiguation, Comput. Linguist., 2001, vol. 27, no. 3, pp. 321–349.
Hoa Trang Dang and Palmer, M., Combining Contextual Features for Word Sense Disambiguation, Proc. of the Workshop on Word Sense Disambiguation: Recent Successes and Future Directions, 2002, pp. 88–94.
Bhattacharya, I., Getoor, L., and Bengio, Y., Unsupervised Sense Disambiguation Using Bilingual Probabilistic Models, ACL’04: Proc. of the 42nd Ann. Meeting of Association for Computational Linguistics, Morristown, NJ, USA: Association for Computational Linguistics, 2004, p. 287.
De Loupy, C., El-Beze, M., and Marteau, P.F., Word Sense Disambiguation Using HMM Tagger, The 1st Int. Conf. on Language Resources and Evaluation (LREC), 1998, pp. 1255–1258.
Ferran, A.M., Molina, A., Pla F., Segarra, E., and Moreno, L., Word Sense Disambiguation Using Statistical Models and WordNet, Proc. of the 3rd Int. Conf. on Language Resources and Evaluation, LREC2002, Las Palmas de Gran Canaria, 2002.
Molina, A., Pla, F., and Segarra, E., WSD System Based on Specialized Hidden Markov Model (upvshmm-eaw), Senseval-3: The Third Int. Workshop on the Evaluation of Systems for the Semantic Analysis of Text, Mihalcea R. and Edmonds, P., Eds., Barcelona, Spain: Association for Computational Linguistics, 2004, pp. 171–174.
Plungyan, V.A., Reznikova T.I., and Sichinava, D.V., National Russian-Language Corpus: General Characteristic, Nauchno-teknicheskaya informatsiya, 2005, Ser. 2, no. 2.
Kobritsov, B.P., Word Sense Disambiguation Methods, Nauchno-teknicheskaya informatsiya, 2004, Ser. 2, no. 3, pp. 9–13.
Kobritsov, B.P. and Lyashevskaya, O.N., Automatic Word Sense Disambiguation in National Russian-Language Corpus, Komp’yutornaya lingvistika i intellektual’nye tekhnologii (Computer Linguistics and Intelligence Technologies), Moscow: Nauka, 2004.
Kobritsov, B.P., Lyashevskaya, O.N., and Shemanaeva, O.Yu., Surface Filters for Disambiguation of Semantic Homonymy in a Text Corpus, Komp’yutornaya lingvistika i intellektual’nye tekhnologii (Computer Linguistics and Intelligence Technologies), Kobozeva, I.M., Narin’yani, A.S., and Selegei, V.P., Eds., Moscow: Nauka, 2005.
Kobritsov, B.P., Lyashevskaya, O.N., and Shemanaeva, O.Yu., Disambiguation of Lexical-Semantic Homonymy in News and Newspaper and Magazine Texts, in Internet-matematika (Internet Mathematics), Moscow, 2005.
Kobritsov, B.P., Lyashevskaya, O.N., and Toldova, S.Yu., Verb Sense Disambiguation with the Use of Inflection Models Extracted from Digital Explanatory Dictionaries, Digital publication, http://download.yandex.ru/IMAT2007/kobricov.pdf.2007.
Shemanaeva, O.Yu., Kustova, G.I., Lyashevskaya, O.N., and Rakhilina, E.V., Semantic Filters for Word Sense Diambiguation in National Russian-Language Corpus: Adjectives, Komp’yutornaya lingvistika i intellektual’nye tekhnologii (Computer Linguistics and Intelligence Technologies), 2006, pp. 138–142.
Zlatic, V., Bozicevic, M., Stefancic, H., and Domazet, M., Wikipedias: Collaborative Web-based Encyclopedias as Complex Networks, Physical Review E., 2006, vol. 74, pp. 16–115.
Strube, M. and Ponzetto, S.P., WikiRelate! Computing Semantic Relatedness Using Wikipedia, Proc. of AAAI, 2006, pp. 1419–1424.
Gabrilovich, E. and Markovitch, S., Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis, Proc. of the 20th Int. Joint Conf. on Artificial Intelligence, 2007, pp. 6–12.
Milne, D., Computing Semantic Relatedness Using Wikipedia Link Structure, Proc. of the New Zealand Comput. Sci. Research Student Conf. (NZCSRSC), Hamilton, New Zealand, 2007.
Milne, D. and Witten, I.H., An Effective, Low-Cost Measure of Semantic Relatedness Obtained from Wikipedia Links, Proc. of the AAAI’08 Workshop on Wikipedia and Artificial Intelligence, 2008.
Yeh, E., Ramage, D., Manning, C.D., Agirre, E., and Soroa, A., WikiWalk: Random Walks on Wikipedia for Semantic Relatedness, Proc. of the 2009 Workshop on Graph-based Methods for Natural Language Processing (TextGraphs-4), Suntec, Singapore: Association for Computational Linguistics, 2009, pp. 41–49.
Turdakov, D. and Velikhov, P., Semantic Relatedness Metric for Wikipedia Concepts Based on Link Analysis and its Applications to Word Sense Disambiguation, Proc. of SYRCoDIS, 2008.
Lizorkin, D., Velikhov, P., Grinev, M., and Turdakov, D., Accuracy Estimate and Optimization Techniques for SimRank Computation, The VLDB J., 2009. http://dx.doi.org/10.1145/1453856.1453904.
Zesch, T. and Gurevych, I., Analysis of the Wikipedia Category Graph for NLP Applications, Proc. of the TextGraphs-2 Workshop, NAACL-HLT, 2007.
Giles, J., Internet Encyclopedias Go Head to Head, Nature, 2005, vol. 438, pp. 900–901.
Mihalcea, R., Using Wikipedia for Automatic Word Sense Disambiguation, Proc. of NAACL HLT 2007, Rochester, NY, 2007, pp. 196–203.
Mihalcea, R. and Csomai, A., Wikify!: Linking Documents to Encyclopedic Knowledge, Proc. of the 16th ACM Conf. on Information and Knowledge Management (CIKM’07), 2007.
Cucerzan, S., Large-Scale Named Entity Disambiguation Based on Wikipedia Data, Proc. of Conf. on Empirical Methods in Natural Language Processing (EMNLP 2007), Prague, 2007, pp. 708–716.
Bunescu, R. and Pasca, M., Using Encyclopedic Knowledge for Named Entity Disambiguation, Proc. of the 11th Conf. of the European Chapter of the Association for Computational Linguistics (EACL), Trento, Italy, 2006.
Medelyan, O., Witten, I.H., and Milne, D., Topic Indexing with Wikipedia, Proc. of the 1st AAAI’08 Workshop on Wikipedia and Artificial Intelligence, 2008.
Milne, D. and Witten, I.H., Learning to Link with Wikipedia, Proc. of the 17th ACM Conf. on Information and Knowledge Management, 2008, pp. 509–518.
Turdakov, D.Yu., Disambiguation of Wikipedia Terms Based on Hidden Markov Model, XI Vserossiiskaya nauchnaya konferentsiya “Elektronnye biblioteki: perspektivnye metody i tekhnologii, elektronnye kollektsii (XI All-Russian Scientific Conf. “Digital Libraries: Perspective Methods and Technologies, Digital Collections”)
Author information
Authors and Affiliations
Corresponding author
Additional information
Original Russian Text © D.Yu. Turdakov, 2010, published in Programmirovanie, 2010, Vol. 36, No. 6.
Rights and permissions
About this article
Cite this article
Turdakov, D.Y. Word sense disambiguation methods. Program Comput Soft 36, 309–326 (2010). https://doi.org/10.1134/S0361768810060010
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S0361768810060010