research-article

Named entity disambiguation by leveraging wikipedia semantic knowledge

Authors:
Xianpei Han

Institute of Automation, Chinese Academy of Sciences, Beijing, China

Institute of Automation, Chinese Academy of Sciences, Beijing, China
View Profile

,
Jun Zhao

Institute of Automation, Chinese Academy of Sciences, Beijing, China

Institute of Automation, Chinese Academy of Sciences, Beijing, China
View Profile

CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementNovember 2009Pages 215–224https://doi.org/10.1145/1645953.1645983

Published:02 November 2009Publication History

CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

Pages 215–224

ABSTRACT

Name ambiguity problem has raised an urgent demand for efficient, high-quality named entity disambiguation methods. The key problem of named entity disambiguation is to measure the similarity between occurrences of names. The traditional methods measure the similarity using the bag of words (BOW) model. The BOW, however, ignores all the semantic relations such as social relatedness between named entities, associative relatedness between concepts, polysemy and synonymy between key terms. So the BOW cannot reflect the actual similarity. Some research has investigated social networks as background knowledge for disambiguation. Social networks, however, can only capture the social relatedness between named entities, and often suffer the limited coverage problem.

To overcome the previous methods' deficiencies, this paper proposes to use Wikipedia as the background knowledge for disambiguation, which surpasses other knowledge bases by the coverage of concepts, rich semantic information and up-to-date content. By leveraging Wikipedia's semantic knowledge like social relatedness between named entities and associative relatedness between concepts, we can measure the similarity between occurrences of names more accurately. In particular, we construct a large-scale semantic network from Wikipedia, in order that the semantic knowledge can be used efficiently and effectively. Based on the constructed semantic network, a novel similarity measure is proposed to leverage Wikipedia semantic knowledge for disambiguation. The proposed method has been tested on the standard WePS data sets. Empirical results show that the disambiguation performance of our method gets 10.7% improvement over the traditional BOW based methods and 16.7% improvement over the traditional social network based methods.

References

Bagga and Baldwin. Entity-Based Cross-Document Coreferencing Using the Vector Space Model, In Proc. of HLT/ACL, 1998. Google ScholarDigital Library
B. Malin. Unsupervised Name Disambiguation via Social Network Similarity, In Proc. of SIAM, 2005.Google Scholar
B. Malin and E. Airoldi. A Network Analysis Model for Disambiguation of Names in Lists. In Proc. of CMOT, 2005. Google ScholarDigital Library
Cheng Niu, Wei Li and Srihari. Weakly Supervised Learning for Cross-document Person Name Disambiguation Supported by Information Extraction. In Proc. of ACL, 2004. Google ScholarDigital Library
D. Milne and Ian H. Witten. Learning to Link with Wikipedia. In Proc. of CIKM, 2008. Google ScholarDigital Library
D. Milne and Ian H. Witten. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In Proc. of AAAI, 2008.Google Scholar
D. Milne, O. Medelyan and Ian H. Witten. Mining Domain-Specific Thesauri from Wikipedia: A case study. In Proc. of IEEE/WIC/ACM WI, 2006. Google ScholarDigital Library
D. V. Kalashnikov, R. Nuray-Turan and S. Mehrotra. Towards Breaking the Quality Curse. A Web-Querying Approach to Web People Search. In Proceedings of SIGIR. 2008. Google ScholarDigital Library
E. Gabrilovich and S. Markovitch. Feature Generation for Text Categorization Using World Knowledge. In Proc. of IJCAI, 2005. Google ScholarDigital Library
Einat Minkov, William W. Cohen and Andrew Y. Ng. Contextual Search and Name Disambiguation in Email Using Graphs. In Proc. of SIGIR, 2006. Google ScholarDigital Library
Enrique Amigo, Julio Gonzalo, Javier Artiles and Felisa Verdejo. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval, 2008. Google ScholarDigital Library
E. Gabrilovich, and S. Markovich. Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis. In Proc. of the IJCAI, 2007. Google ScholarDigital Library
Gideon S. Mann and David Yarowsky. Unsupervised Personal Name Disambiguation. In Proc. of CONIL, 2003. Google ScholarDigital Library
Javier Artiles, Julio Gonzalo and Satoshi Sekine. The SemEval-2007 WePS Evaluation: Establishing a benchmark for the Web People Search Task. In SemEval, 2007. Google ScholarDigital Library
Javier Artiles, Julio Gonzalo and Satoshi Sekine. WePS2 Evaluation Campaign: Overview of the Web People Search Clustering Task. In WePS2, WWW 2009, 2009.Google Scholar
Jian Hu, Lujun Fang, Yang Cao, et al. Enhancing Text Clustering by Leveraging Wikipedia Semantics. In Proc. Of SIGIR, 2008. Google ScholarDigital Library
J. Hassell, B. Aleman-Meza and IB Arpinar. Ontology-Driven Automatic Entity Disambiguation in Unstructured Text. In Proc. of ISWC, 2006 Google ScholarDigital Library
Kai-Hsiang Yang, Kun-Yan Chiou, Hahn-Ming Lee and Jan-Ming Ho. Web Appearance Disambiguation of Personal Names Based on Network Motif. In Proc. of WI, 2006. Google ScholarDigital Library
O. Medelyan, Ian H. Witten and D. Milne. Topic Indexing with Wikipedia. In WIKIAI, AAAI 2008. 2008.Google Scholar
R. Mihalcea and A. Csomai. Wikify!: linking documents to encyclopedic knowledge. In Proc. of CIKM. 2007. Google ScholarDigital Library
Michael Ben Fleischman. Multi-Document Person Name Resolution, In Proc. of ACL, 2004.Google Scholar
Razvan Bunescu and Marius Pasca. Using Encyclopedic Knowledge for Named Entity Disambiguation. In Proc. of EACL, 2006.Google Scholar
Ron Bekkerman and Andrew McCallum. Disambiguating Web Appearances of People in a Social Network. In Proc. of WWW, 2005 Google ScholarDigital Library
Silviu Cucerzan. Large-Scale Named Entity Disambiguation Based on Wikipedia Data. In Proc. of EMNLP, 2007.Google Scholar
Strube, M. and Ponzetto, S. P. WikiRelate! Computing Semantic Relatedness Using Wikipedia. In Proc. of AAAI, 2006. Google ScholarDigital Library
Ted Pedersen, Amruta Purandare and Anagha Kulkarni. Name Discrimination by Clustering Similar Contexts. In Proc. of CICLing, 2005. Google ScholarDigital Library
Xiaojun Wan, Jianfeng Gao, Mu Li and Binggong Ding. Person Resolution in Person Search Results: WebHawk. In Proc. of CIKM, 2005. Google ScholarDigital Library
Ying Chen, James Martin. Towards Robust Unsupervised Personal Name Disambiguation. In Proc. of EMNLP, 2007.Google Scholar
Hjørland, Birger. Semantics and Knowledge Organization. Annual Review of Information Science and Technology 41:367--40, 2007. Google ScholarDigital Library

Index Terms

Named entity disambiguation by leveraging wikipedia semantic knowledge
1. Information systems
  1. Information retrieval

Recommendations

Semantic relatedness for named entity disambiguation using a small wikipedia
TSD'11: Proceedings of the 14th international conference on Text, speech and dialogue

Resolving Named Entity Disambiguation task with a small knowledge base makes the task more challenging. Concretely, we present an evaluation of the state-of-the-art methods in this task for Basque NE disambiguation based on the Basque Wikipedia. We have ...
Read More
Named entity recognition and disambiguation using linked data and graph-based centrality scoring
SWIM '12: Proceedings of the 4th International Workshop on Semantic Web Information Management

Named Entity Recognition (NER) is a subtask of information extraction and aims to identify atomic entities in text that fall into predefined categories such as person, location, organization, etc. Recent efforts in NER try to extract entities and link ...
Read More
Named entity disambiguation using freebase and syntactic parsing
LD4IE'13: Proceedings of the First International Conference on Linked Data for Information Extraction - Volume 1057

Named Entity Disambiguation (NED) is a fundamental task of semantic annotation for the Semantic Web. The task of Word Sense Disambiguation (WSD) in Ontology-Based Information Extraction (OBIE) aims to establish a link between the textual entity mention ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management
November 2009
2162 pages
ISBN:9781605585123
DOI:10.1145/1645953
General Chairs:
David Cheung
University of Hong Kong, Hong Kong
,
Il-Yeol Song
Drexel University, USA
,
Program Chairs:
Wesley Chu
UCLA, USA
,
Xiaohua Hu
Drexel University, USA
,
Jimmy Lin
University of Maryland, USA
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 November 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
coreference resolution
name ambiguity
named entity disambiguation
record linkage
semantic knowledge
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 92
  Total Citations
  View Citations
- 2,005
  Total Downloads
- Downloads (Last 12 months)19
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Named entity disambiguation by leveraging wikipedia semantic knowledge

CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Semantic relatedness for named entity disambiguation using a small wikipedia

Named entity recognition and disambiguation using linked data and graph-based centrality scoring

Named entity disambiguation using freebase and syntactic parsing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Named entity disambiguation by leveraging wikipedia semantic knowledge

CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Semantic relatedness for named entity disambiguation using a small wikipedia

Named entity recognition and disambiguation using linked data and graph-based centrality scoring

Named entity disambiguation using freebase and syntactic parsing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media