skip to main content
10.1145/1645953.1645983acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Named entity disambiguation by leveraging wikipedia semantic knowledge

Authors Info & Claims
Published:02 November 2009Publication History

ABSTRACT

Name ambiguity problem has raised an urgent demand for efficient, high-quality named entity disambiguation methods. The key problem of named entity disambiguation is to measure the similarity between occurrences of names. The traditional methods measure the similarity using the bag of words (BOW) model. The BOW, however, ignores all the semantic relations such as social relatedness between named entities, associative relatedness between concepts, polysemy and synonymy between key terms. So the BOW cannot reflect the actual similarity. Some research has investigated social networks as background knowledge for disambiguation. Social networks, however, can only capture the social relatedness between named entities, and often suffer the limited coverage problem.

To overcome the previous methods' deficiencies, this paper proposes to use Wikipedia as the background knowledge for disambiguation, which surpasses other knowledge bases by the coverage of concepts, rich semantic information and up-to-date content. By leveraging Wikipedia's semantic knowledge like social relatedness between named entities and associative relatedness between concepts, we can measure the similarity between occurrences of names more accurately. In particular, we construct a large-scale semantic network from Wikipedia, in order that the semantic knowledge can be used efficiently and effectively. Based on the constructed semantic network, a novel similarity measure is proposed to leverage Wikipedia semantic knowledge for disambiguation. The proposed method has been tested on the standard WePS data sets. Empirical results show that the disambiguation performance of our method gets 10.7% improvement over the traditional BOW based methods and 16.7% improvement over the traditional social network based methods.

References

  1. Bagga and Baldwin. Entity-Based Cross-Document Coreferencing Using the Vector Space Model, In Proc. of HLT/ACL, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. Malin. Unsupervised Name Disambiguation via Social Network Similarity, In Proc. of SIAM, 2005.Google ScholarGoogle Scholar
  3. B. Malin and E. Airoldi. A Network Analysis Model for Disambiguation of Names in Lists. In Proc. of CMOT, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cheng Niu, Wei Li and Srihari. Weakly Supervised Learning for Cross-document Person Name Disambiguation Supported by Information Extraction. In Proc. of ACL, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Milne and Ian H. Witten. Learning to Link with Wikipedia. In Proc. of CIKM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Milne and Ian H. Witten. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In Proc. of AAAI, 2008.Google ScholarGoogle Scholar
  7. D. Milne, O. Medelyan and Ian H. Witten. Mining Domain-Specific Thesauri from Wikipedia: A case study. In Proc. of IEEE/WIC/ACM WI, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. V. Kalashnikov, R. Nuray-Turan and S. Mehrotra. Towards Breaking the Quality Curse. A Web-Querying Approach to Web People Search. In Proceedings of SIGIR. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. E. Gabrilovich and S. Markovitch. Feature Generation for Text Categorization Using World Knowledge. In Proc. of IJCAI, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Einat Minkov, William W. Cohen and Andrew Y. Ng. Contextual Search and Name Disambiguation in Email Using Graphs. In Proc. of SIGIR, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Enrique Amigo, Julio Gonzalo, Javier Artiles and Felisa Verdejo. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. Gabrilovich, and S. Markovich. Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis. In Proc. of the IJCAI, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gideon S. Mann and David Yarowsky. Unsupervised Personal Name Disambiguation. In Proc. of CONIL, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Javier Artiles, Julio Gonzalo and Satoshi Sekine. The SemEval-2007 WePS Evaluation: Establishing a benchmark for the Web People Search Task. In SemEval, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Javier Artiles, Julio Gonzalo and Satoshi Sekine. WePS2 Evaluation Campaign: Overview of the Web People Search Clustering Task. In WePS2, WWW 2009, 2009.Google ScholarGoogle Scholar
  16. Jian Hu, Lujun Fang, Yang Cao, et al. Enhancing Text Clustering by Leveraging Wikipedia Semantics. In Proc. Of SIGIR, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Hassell, B. Aleman-Meza and IB Arpinar. Ontology-Driven Automatic Entity Disambiguation in Unstructured Text. In Proc. of ISWC, 2006 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kai-Hsiang Yang, Kun-Yan Chiou, Hahn-Ming Lee and Jan-Ming Ho. Web Appearance Disambiguation of Personal Names Based on Network Motif. In Proc. of WI, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. O. Medelyan, Ian H. Witten and D. Milne. Topic Indexing with Wikipedia. In WIKIAI, AAAI 2008. 2008.Google ScholarGoogle Scholar
  20. R. Mihalcea and A. Csomai. Wikify!: linking documents to encyclopedic knowledge. In Proc. of CIKM. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Michael Ben Fleischman. Multi-Document Person Name Resolution, In Proc. of ACL, 2004.Google ScholarGoogle Scholar
  22. Razvan Bunescu and Marius Pasca. Using Encyclopedic Knowledge for Named Entity Disambiguation. In Proc. of EACL, 2006.Google ScholarGoogle Scholar
  23. Ron Bekkerman and Andrew McCallum. Disambiguating Web Appearances of People in a Social Network. In Proc. of WWW, 2005 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Silviu Cucerzan. Large-Scale Named Entity Disambiguation Based on Wikipedia Data. In Proc. of EMNLP, 2007.Google ScholarGoogle Scholar
  25. Strube, M. and Ponzetto, S. P. WikiRelate! Computing Semantic Relatedness Using Wikipedia. In Proc. of AAAI, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Ted Pedersen, Amruta Purandare and Anagha Kulkarni. Name Discrimination by Clustering Similar Contexts. In Proc. of CICLing, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Xiaojun Wan, Jianfeng Gao, Mu Li and Binggong Ding. Person Resolution in Person Search Results: WebHawk. In Proc. of CIKM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Ying Chen, James Martin. Towards Robust Unsupervised Personal Name Disambiguation. In Proc. of EMNLP, 2007.Google ScholarGoogle Scholar
  29. Hjørland, Birger. Semantics and Knowledge Organization. Annual Review of Information Science and Technology 41:367--40, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Named entity disambiguation by leveraging wikipedia semantic knowledge

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management
      November 2009
      2162 pages
      ISBN:9781605585123
      DOI:10.1145/1645953

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 2 November 2009

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader