research-article

Creating a Similarity Graph from WordNet

Author:
Lubomir Stanchev

Indiana University -- Purdue University Fort Wayne, Fort Wayne, Indiana, USA

Indiana University -- Purdue University Fort Wayne, Fort Wayne, Indiana, USA
View Profile

WIMS '14: Proceedings of the 4th International Conference on Web Intelligence, Mining and Semantics (WIMS14)June 2014Article No.: 36Pages 1–11https://doi.org/10.1145/2611040.2611055

Published:02 June 2014Publication History

WIMS '14: Proceedings of the 4th International Conference on Web Intelligence, Mining and Semantics (WIMS14)

Pages 1–11

ABSTRACT

The paper addresses the problem of modeling the relationship between the words in the English language using a similarity graph. The mathematical model stores data about the strength of the relationship between words expressed as a decimal number. Both structured data from WordNet, such as that the word "canine" is a hypernym (i.e., kind of) of the word "dog", and textual descriptions, such as that the definition of the word "dog" is: "a member of the genus Canis that has been domesticated by man since prehistoric times", are used in creating the graph. The quality of the graph data is validated by comparing the similarity of pairs of words using our software that uses the graph with results of studies that are performed with human subjects. To the best of our knowledge, our software produces better correlation with the results of both the Miller and Charles study and the WordSimilarity-353 study than any other published research.

References

OWL Web Ontology Language Guide. http://www.w3.org/TR/owl-guide/.Google Scholar
D. Bollegala, Y. Matsuo, and M. Ishizuka. A Relational Model of Semantic Similarity Between Words Using Automatically Extracted Lexical Pattern Clusters from Web. Conference on Empirical Methods in Natural Language Processing, 2009. Google ScholarDigital Library
L. Burnard. Reference Guide for the British National Corpus (XML Edition). http://www.natcorp.ox.ac.uk, 2007.Google Scholar
R. L. Cilibrasi and P. M. Vitanyi. The Google Similarity Distance. IEEE ITSOC Inforamtion Theory Workshop, 2005.Google Scholar
L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin. Placing Search in Context: The Concept Revisited. ACM Transactions on Information Systems, 20(1):116--131, January 2002. Google ScholarDigital Library
C. Fox. Lexical Analysis and Stoplists. Information Retrieval: Data Structures and Algorithms, pages 102--130, 1992. Google ScholarDigital Library
W. Frakes. Stemming Algorithms. Information Retrieval: Data Structures and Algorithms, pages 131--160, 1992. Google ScholarDigital Library
G. Hirst and D. St-Onge. Lexical chains as representations of context for the detection and correction of malapropisms. Fellbaum, pages 305--332, 1998.Google Scholar
M. Jarmasz. Roget's Thesaurus as a Lexical Resource for Natural Language Processing. Master's thesis, University of Ottawa, 1993.Google Scholar
G. Jeh and J. Widom. SimRank: A Measure of Structural-context Similarity. Proceedings of the Eight ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 538--543, 2002. Google ScholarDigital Library
J. Jiang and D. Conrath. Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. Proceedings on International Conference on Research in Computational Linguistics, pages 19--33, 1997.Google Scholar
K. Jones. "a statistical interpretation of term specificity and its application in retrieval". Journal of Documentation, 28(1):11--21, 1972.Google ScholarCross Ref
R. Knappe, H. Bulskov, and T. Andreasen. Similarity Graphs. Fourteenth International Symposium on Foundations of Intelligent Systems, 2003.Google Scholar
S. Kulkami and D. Caragea. Computation of the Semantic Relatedness Between Words Using Concept Clouds. International Conference of Knowledge Discovery and Information Retrieval, 2009.Google Scholar
C. Leacock and M. Chodorow. Combining Local Context and WordNet Similarity for Word Sense Identification. WordNet: An electronic lexical database, pages 265--283, 1998.Google Scholar
D. Lin. An Information-theoretic Definition of Similarity. Proceedings of the Fifteenth International Conference on Machine Learning, pages 296--304, 1998. Google ScholarDigital Library
J. B. MacQueen. Some Methods for classification and Analysis of Multivariate Observations. Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, page 281&Udblac;297, 1967.Google Scholar
M. F. Porter. An Algorithm for Suffix Stripping. Readings in Information Retrieval, pages 313--316, 1997. Google ScholarDigital Library
G. Miller and W. Charles. Contextual Correlates of Semantic Similarity. Language and Congnitive Processing, 6(1):1--28, 1991.Google Scholar
G. A. Miller. WordNet: A Lexical Database for English. Communications of the ACM, 38(11):39--41, 1995. Google ScholarDigital Library
Oracle. Berkeley DB. http://www.oracle.com.Google Scholar
R. Pan, Z. Ding, Y. Yu, and Y. Peng. A Bayesian Network Approach to Ontology Mapping. Proceedings of the Fourth International Semantic Web Conference, 2005. Google ScholarDigital Library
J. Pearl. Bayesian Networks: A Model of Self-Activated Memory for Evidential Reasoning. Proceedings of the 7th Conference of the Cognitive Science Society, University of California, Irvine, CA., page 329&Udblac;334, 1985.Google Scholar
P. Resnik. Using Information Content to Evaluate Semantic Similarity in a Taxonomy. International Joint Conference on Artificial Intelligence, pages 448--453, 1995. Google ScholarDigital Library
R. Rada, H. Mili, E. Bickness, and M. Blettner. Development and Application of a Metric on Semantic Nets. IEEE Transactions on Systems, Man, and Cybernetics, 19(1):17--30, 1989.Google Scholar
Q. Rajput and S. Haider. Use of Bayesian Networks in Information Extraction from Unstructured Data Sources. Proceedings of International Conference on Ontological and Semantic Engineering, pages 325--331, 2009.Google Scholar
Simone Paolo Ponzetto and Michael Strube. Deriving a Large Scale Taxonomy from Wikipedia. 22nd International conference on Artificial intelligence, 2007. Google ScholarDigital Library
E. Sirin and B. Parsia. SPARQL-DL: SPARQL Query for OWL-DL. 3rd OWL: Experiences and Directions Workshop (OWLED), 2007.Google Scholar
B. Spell. Java API for WordNet Searching (JAWS). http://lyle.smu.edu/tspell/jaws/index.html, 2009.Google Scholar
L. Stanchev. Building Semantic Corpus from WordNet. The First International Workshop on the role of Semantic Web in Literature-Based Discovery, 2012. Google ScholarDigital Library
L. Stanchev. Similarity Software. http://softbase.ipfw.edu:8080/Similarity, 2012.Google Scholar
M. Steyvers and J. Tenenbaum. The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth. Cognitive Science, 29(1):41--78, 2005.Google Scholar
M. Strube and S. P. Ponzetto. Wikirelate! Computing Semantic Relatedness using Wikipedia. Association for the Advancement of Artificial Intelligence Conference, 2006. Google ScholarDigital Library
J. Webber and I. Robinson. Graph Databases. O'Reilly, 2013. Google ScholarDigital Library
Z. Wu and M. Palmer. Verb semantics and lexcial selection. Annual Meeting of the Association for Computational Linguistics, pages 133--138, 1994. Google ScholarDigital Library
D. Yang and D. M. Powers. Measureing Semantic Similarity in the Taxonomy of WordNet. Australian Computer Science Conference, pages 315--322, 2005. Google ScholarDigital Library

Index Terms

Creating a Similarity Graph from WordNet
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

Data-driven synset induction and disambiguation for wordnet development

Automatic methods for wordnet development in languages other than English generally exploit information found in Princeton WordNet (PWN) and translations extracted from parallel corpora. A common approach consists in preserving the structure of PWN and ...
Read More
Enriching the adjective domain in the Japanese wordnet
IceTAL'10: Proceedings of the 7th international conference on Advances in natural language processing

We released Japanese WordNet Version 1.0 in March 2010, and are continuing to enrich the Japanese WordNet in several directions. The current version of the Japanese WordNet is a kind of translation of Princeton WordNet 3.0 and we used WordNets of ...
Read More
Developing the Persian Wordnet of Verbs Using Supervised Learning
Nowadays, wordnets are extensively used as a major resource in natural language processing and information retrieval tasks. Therefore, the accuracy of wordnets has a direct influence on the performance of the involved applications. This paper presents a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WIMS '14: Proceedings of the 4th International Conference on Web Intelligence, Mining and Semantics (WIMS14)
June 2014
506 pages
ISBN:9781450325387
DOI:10.1145/2611040
Program Chairs:
Rajendra Akerkar,
Nick Bassiliades,
John Davies,
Vadim Ermolayev
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 June 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
WordNet
similarity distance
similarity graph
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
WIMS '14 Paper Acceptance Rate41of90submissions,46%Overall Acceptance Rate140of278submissions,50%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 13
  Total Citations
  View Citations
- 333
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Creating a Similarity Graph from WordNet

WIMS '14: Proceedings of the 4th International Conference on Web Intelligence, Mining and Semantics (WIMS14)

ABSTRACT

References

Cited By

Index Terms

Recommendations

Data-driven synset induction and disambiguation for wordnet development

Enriching the adjective domain in the Japanese wordnet

Developing the Persian Wordnet of Verbs Using Supervised Learning