ABSTRACT
The paper addresses the problem of modeling the relationship between the words in the English language using a similarity graph. The mathematical model stores data about the strength of the relationship between words expressed as a decimal number. Both structured data from WordNet, such as that the word "canine" is a hypernym (i.e., kind of) of the word "dog", and textual descriptions, such as that the definition of the word "dog" is: "a member of the genus Canis that has been domesticated by man since prehistoric times", are used in creating the graph. The quality of the graph data is validated by comparing the similarity of pairs of words using our software that uses the graph with results of studies that are performed with human subjects. To the best of our knowledge, our software produces better correlation with the results of both the Miller and Charles study and the WordSimilarity-353 study than any other published research.
- OWL Web Ontology Language Guide. http://www.w3.org/TR/owl-guide/.Google Scholar
- D. Bollegala, Y. Matsuo, and M. Ishizuka. A Relational Model of Semantic Similarity Between Words Using Automatically Extracted Lexical Pattern Clusters from Web. Conference on Empirical Methods in Natural Language Processing, 2009. Google ScholarDigital Library
- L. Burnard. Reference Guide for the British National Corpus (XML Edition). http://www.natcorp.ox.ac.uk, 2007.Google Scholar
- R. L. Cilibrasi and P. M. Vitanyi. The Google Similarity Distance. IEEE ITSOC Inforamtion Theory Workshop, 2005.Google Scholar
- L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin. Placing Search in Context: The Concept Revisited. ACM Transactions on Information Systems, 20(1):116--131, January 2002. Google ScholarDigital Library
- C. Fox. Lexical Analysis and Stoplists. Information Retrieval: Data Structures and Algorithms, pages 102--130, 1992. Google ScholarDigital Library
- W. Frakes. Stemming Algorithms. Information Retrieval: Data Structures and Algorithms, pages 131--160, 1992. Google ScholarDigital Library
- G. Hirst and D. St-Onge. Lexical chains as representations of context for the detection and correction of malapropisms. Fellbaum, pages 305--332, 1998.Google Scholar
- M. Jarmasz. Roget's Thesaurus as a Lexical Resource for Natural Language Processing. Master's thesis, University of Ottawa, 1993.Google Scholar
- G. Jeh and J. Widom. SimRank: A Measure of Structural-context Similarity. Proceedings of the Eight ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 538--543, 2002. Google ScholarDigital Library
- J. Jiang and D. Conrath. Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. Proceedings on International Conference on Research in Computational Linguistics, pages 19--33, 1997.Google Scholar
- K. Jones. "a statistical interpretation of term specificity and its application in retrieval". Journal of Documentation, 28(1):11--21, 1972.Google ScholarCross Ref
- R. Knappe, H. Bulskov, and T. Andreasen. Similarity Graphs. Fourteenth International Symposium on Foundations of Intelligent Systems, 2003.Google Scholar
- S. Kulkami and D. Caragea. Computation of the Semantic Relatedness Between Words Using Concept Clouds. International Conference of Knowledge Discovery and Information Retrieval, 2009.Google Scholar
- C. Leacock and M. Chodorow. Combining Local Context and WordNet Similarity for Word Sense Identification. WordNet: An electronic lexical database, pages 265--283, 1998.Google Scholar
- D. Lin. An Information-theoretic Definition of Similarity. Proceedings of the Fifteenth International Conference on Machine Learning, pages 296--304, 1998. Google ScholarDigital Library
- J. B. MacQueen. Some Methods for classification and Analysis of Multivariate Observations. Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, page 281Ű297, 1967.Google Scholar
- M. F. Porter. An Algorithm for Suffix Stripping. Readings in Information Retrieval, pages 313--316, 1997. Google ScholarDigital Library
- G. Miller and W. Charles. Contextual Correlates of Semantic Similarity. Language and Congnitive Processing, 6(1):1--28, 1991.Google Scholar
- G. A. Miller. WordNet: A Lexical Database for English. Communications of the ACM, 38(11):39--41, 1995. Google ScholarDigital Library
- Oracle. Berkeley DB. http://www.oracle.com.Google Scholar
- R. Pan, Z. Ding, Y. Yu, and Y. Peng. A Bayesian Network Approach to Ontology Mapping. Proceedings of the Fourth International Semantic Web Conference, 2005. Google ScholarDigital Library
- J. Pearl. Bayesian Networks: A Model of Self-Activated Memory for Evidential Reasoning. Proceedings of the 7th Conference of the Cognitive Science Society, University of California, Irvine, CA., page 329Ű334, 1985.Google Scholar
- P. Resnik. Using Information Content to Evaluate Semantic Similarity in a Taxonomy. International Joint Conference on Artificial Intelligence, pages 448--453, 1995. Google ScholarDigital Library
- R. Rada, H. Mili, E. Bickness, and M. Blettner. Development and Application of a Metric on Semantic Nets. IEEE Transactions on Systems, Man, and Cybernetics, 19(1):17--30, 1989.Google Scholar
- Q. Rajput and S. Haider. Use of Bayesian Networks in Information Extraction from Unstructured Data Sources. Proceedings of International Conference on Ontological and Semantic Engineering, pages 325--331, 2009.Google Scholar
- Simone Paolo Ponzetto and Michael Strube. Deriving a Large Scale Taxonomy from Wikipedia. 22nd International conference on Artificial intelligence, 2007. Google ScholarDigital Library
- E. Sirin and B. Parsia. SPARQL-DL: SPARQL Query for OWL-DL. 3rd OWL: Experiences and Directions Workshop (OWLED), 2007.Google Scholar
- B. Spell. Java API for WordNet Searching (JAWS). http://lyle.smu.edu/tspell/jaws/index.html, 2009.Google Scholar
- L. Stanchev. Building Semantic Corpus from WordNet. The First International Workshop on the role of Semantic Web in Literature-Based Discovery, 2012. Google ScholarDigital Library
- L. Stanchev. Similarity Software. http://softbase.ipfw.edu:8080/Similarity, 2012.Google Scholar
- M. Steyvers and J. Tenenbaum. The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth. Cognitive Science, 29(1):41--78, 2005.Google Scholar
- M. Strube and S. P. Ponzetto. Wikirelate! Computing Semantic Relatedness using Wikipedia. Association for the Advancement of Artificial Intelligence Conference, 2006. Google ScholarDigital Library
- J. Webber and I. Robinson. Graph Databases. O'Reilly, 2013. Google ScholarDigital Library
- Z. Wu and M. Palmer. Verb semantics and lexcial selection. Annual Meeting of the Association for Computational Linguistics, pages 133--138, 1994. Google ScholarDigital Library
- D. Yang and D. M. Powers. Measureing Semantic Similarity in the Taxonomy of WordNet. Australian Computer Science Conference, pages 315--322, 2005. Google ScholarDigital Library
Index Terms
- Creating a Similarity Graph from WordNet
Recommendations
Data-driven synset induction and disambiguation for wordnet development
Automatic methods for wordnet development in languages other than English generally exploit information found in Princeton WordNet (PWN) and translations extracted from parallel corpora. A common approach consists in preserving the structure of PWN and ...
Enriching the adjective domain in the Japanese wordnet
IceTAL'10: Proceedings of the 7th international conference on Advances in natural language processingWe released Japanese WordNet Version 1.0 in March 2010, and are continuing to enrich the Japanese WordNet in several directions. The current version of the Japanese WordNet is a kind of translation of Princeton WordNet 3.0 and we used WordNets of ...
Developing the Persian Wordnet of Verbs Using Supervised Learning
Nowadays, wordnets are extensively used as a major resource in natural language processing and information retrieval tasks. Therefore, the accuracy of wordnets has a direct influence on the performance of the involved applications. This paper presents a ...
Comments