ABSTRACT
Name ambiguity problem has raised an urgent demand for efficient, high-quality named entity disambiguation methods. The key problem of named entity disambiguation is to measure the similarity between occurrences of names. The traditional methods measure the similarity using the bag of words (BOW) model. The BOW, however, ignores all the semantic relations such as social relatedness between named entities, associative relatedness between concepts, polysemy and synonymy between key terms. So the BOW cannot reflect the actual similarity. Some research has investigated social networks as background knowledge for disambiguation. Social networks, however, can only capture the social relatedness between named entities, and often suffer the limited coverage problem.
To overcome the previous methods' deficiencies, this paper proposes to use Wikipedia as the background knowledge for disambiguation, which surpasses other knowledge bases by the coverage of concepts, rich semantic information and up-to-date content. By leveraging Wikipedia's semantic knowledge like social relatedness between named entities and associative relatedness between concepts, we can measure the similarity between occurrences of names more accurately. In particular, we construct a large-scale semantic network from Wikipedia, in order that the semantic knowledge can be used efficiently and effectively. Based on the constructed semantic network, a novel similarity measure is proposed to leverage Wikipedia semantic knowledge for disambiguation. The proposed method has been tested on the standard WePS data sets. Empirical results show that the disambiguation performance of our method gets 10.7% improvement over the traditional BOW based methods and 16.7% improvement over the traditional social network based methods.
- Bagga and Baldwin. Entity-Based Cross-Document Coreferencing Using the Vector Space Model, In Proc. of HLT/ACL, 1998. Google ScholarDigital Library
- B. Malin. Unsupervised Name Disambiguation via Social Network Similarity, In Proc. of SIAM, 2005.Google Scholar
- B. Malin and E. Airoldi. A Network Analysis Model for Disambiguation of Names in Lists. In Proc. of CMOT, 2005. Google ScholarDigital Library
- Cheng Niu, Wei Li and Srihari. Weakly Supervised Learning for Cross-document Person Name Disambiguation Supported by Information Extraction. In Proc. of ACL, 2004. Google ScholarDigital Library
- D. Milne and Ian H. Witten. Learning to Link with Wikipedia. In Proc. of CIKM, 2008. Google ScholarDigital Library
- D. Milne and Ian H. Witten. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In Proc. of AAAI, 2008.Google Scholar
- D. Milne, O. Medelyan and Ian H. Witten. Mining Domain-Specific Thesauri from Wikipedia: A case study. In Proc. of IEEE/WIC/ACM WI, 2006. Google ScholarDigital Library
- D. V. Kalashnikov, R. Nuray-Turan and S. Mehrotra. Towards Breaking the Quality Curse. A Web-Querying Approach to Web People Search. In Proceedings of SIGIR. 2008. Google ScholarDigital Library
- E. Gabrilovich and S. Markovitch. Feature Generation for Text Categorization Using World Knowledge. In Proc. of IJCAI, 2005. Google ScholarDigital Library
- Einat Minkov, William W. Cohen and Andrew Y. Ng. Contextual Search and Name Disambiguation in Email Using Graphs. In Proc. of SIGIR, 2006. Google ScholarDigital Library
- Enrique Amigo, Julio Gonzalo, Javier Artiles and Felisa Verdejo. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval, 2008. Google ScholarDigital Library
- E. Gabrilovich, and S. Markovich. Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis. In Proc. of the IJCAI, 2007. Google ScholarDigital Library
- Gideon S. Mann and David Yarowsky. Unsupervised Personal Name Disambiguation. In Proc. of CONIL, 2003. Google ScholarDigital Library
- Javier Artiles, Julio Gonzalo and Satoshi Sekine. The SemEval-2007 WePS Evaluation: Establishing a benchmark for the Web People Search Task. In SemEval, 2007. Google ScholarDigital Library
- Javier Artiles, Julio Gonzalo and Satoshi Sekine. WePS2 Evaluation Campaign: Overview of the Web People Search Clustering Task. In WePS2, WWW 2009, 2009.Google Scholar
- Jian Hu, Lujun Fang, Yang Cao, et al. Enhancing Text Clustering by Leveraging Wikipedia Semantics. In Proc. Of SIGIR, 2008. Google ScholarDigital Library
- J. Hassell, B. Aleman-Meza and IB Arpinar. Ontology-Driven Automatic Entity Disambiguation in Unstructured Text. In Proc. of ISWC, 2006 Google ScholarDigital Library
- Kai-Hsiang Yang, Kun-Yan Chiou, Hahn-Ming Lee and Jan-Ming Ho. Web Appearance Disambiguation of Personal Names Based on Network Motif. In Proc. of WI, 2006. Google ScholarDigital Library
- O. Medelyan, Ian H. Witten and D. Milne. Topic Indexing with Wikipedia. In WIKIAI, AAAI 2008. 2008.Google Scholar
- R. Mihalcea and A. Csomai. Wikify!: linking documents to encyclopedic knowledge. In Proc. of CIKM. 2007. Google ScholarDigital Library
- Michael Ben Fleischman. Multi-Document Person Name Resolution, In Proc. of ACL, 2004.Google Scholar
- Razvan Bunescu and Marius Pasca. Using Encyclopedic Knowledge for Named Entity Disambiguation. In Proc. of EACL, 2006.Google Scholar
- Ron Bekkerman and Andrew McCallum. Disambiguating Web Appearances of People in a Social Network. In Proc. of WWW, 2005 Google ScholarDigital Library
- Silviu Cucerzan. Large-Scale Named Entity Disambiguation Based on Wikipedia Data. In Proc. of EMNLP, 2007.Google Scholar
- Strube, M. and Ponzetto, S. P. WikiRelate! Computing Semantic Relatedness Using Wikipedia. In Proc. of AAAI, 2006. Google ScholarDigital Library
- Ted Pedersen, Amruta Purandare and Anagha Kulkarni. Name Discrimination by Clustering Similar Contexts. In Proc. of CICLing, 2005. Google ScholarDigital Library
- Xiaojun Wan, Jianfeng Gao, Mu Li and Binggong Ding. Person Resolution in Person Search Results: WebHawk. In Proc. of CIKM, 2005. Google ScholarDigital Library
- Ying Chen, James Martin. Towards Robust Unsupervised Personal Name Disambiguation. In Proc. of EMNLP, 2007.Google Scholar
- Hjørland, Birger. Semantics and Knowledge Organization. Annual Review of Information Science and Technology 41:367--40, 2007. Google ScholarDigital Library
Index Terms
- Named entity disambiguation by leveraging wikipedia semantic knowledge
Recommendations
Semantic relatedness for named entity disambiguation using a small wikipedia
TSD'11: Proceedings of the 14th international conference on Text, speech and dialogueResolving Named Entity Disambiguation task with a small knowledge base makes the task more challenging. Concretely, we present an evaluation of the state-of-the-art methods in this task for Basque NE disambiguation based on the Basque Wikipedia. We have ...
Named entity recognition and disambiguation using linked data and graph-based centrality scoring
SWIM '12: Proceedings of the 4th International Workshop on Semantic Web Information ManagementNamed Entity Recognition (NER) is a subtask of information extraction and aims to identify atomic entities in text that fall into predefined categories such as person, location, organization, etc. Recent efforts in NER try to extract entities and link ...
Named entity disambiguation using freebase and syntactic parsing
LD4IE'13: Proceedings of the First International Conference on Linked Data for Information Extraction - Volume 1057Named Entity Disambiguation (NED) is a fundamental task of semantic annotation for the Semantic Web. The task of Word Sense Disambiguation (WSD) in Ontology-Based Information Extraction (OBIE) aims to establish a link between the textual entity mention ...
Comments