skip to main content
10.1145/1871437.1871451acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Entity ranking using Wikipedia as a pivot

Authors Info & Claims
Published:26 October 2010Publication History

ABSTRACT

In this paper we investigate the task of Entity Ranking on the Web. Searchers looking for entities are arguably better served by presenting a ranked list of entities directly, rather than a list of web pages with relevant but also potentially redundant information about these entities. Since entities are represented by their web homepages, a naive approach to entity ranking is to use standard text retrieval. Our experimental results clearly demonstrate that text retrieval is effective at finding relevant pages, but performs poorly at finding entities. Our proposal is to use Wikipedia as a pivot for finding entities on the Web, allowing us to reduce the hard web entity ranking problem to easier problem of Wikipedia entity ranking. Wikipedia allows us to properly identify entities and some of their characteristics, and Wikipedia's elaborate category structure allows us to get a handle on the entity's type.

Our main findings are the following. Our first finding is that, in principle, the problem of web entity ranking can be reduced to Wikipedia entity ranking. We found that the majority of entity ranking topics in our test collections can be answered using Wikipedia, and that with high precision relevant web entities corresponding to the Wikipedia entities can be found using Wikipedia's 'external links'. Our second finding is that we can exploit the structure of Wikipedia to improve entity ranking effectiveness. Entity types are valuable retrieval cues in Wikipedia. Automatically assigned entity types are effective, and almost as good as manually assigned types. Our third finding is that web entity retrieval can be significantly improved by using Wikipedia as a pivot. Both Wikipedia's external links and the enriched Wikipedia entities with additional links to homepages are significantly better at finding primary web homepages than anchor text retrieval, which in turn significantly improved over standard text retrieval.

References

  1. A. Arampatzis and J. Kamps. A signal-to-noise approach to score normalization. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM 2009), pages 797--806. ACM Press, New York USA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. Balog. People Search in the Enterprise. PhD thesis, University of Amsterdam, 2008.Google ScholarGoogle Scholar
  3. K. Balog, M. Bron, and M. de Rijke. Category-based query modeling for entity search. In 32nd European Conference on Information Retrieval (ECIR 2010), pages 319--331. Springer, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. Balog and M. de Rijke. Determining expert profiles (with an application to expert finding). In Proceedings of the IJCAI '07, pages pages 2657--2662, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Balog, A. de Vries, P. Serdyukov, P. Thomas, and T. West- erveld. Overview of the TREC 2009 entity track. In The Eighteenth Text REtrieval Conference (TREC 2009) Notebook. National Institute for Standards and Technology, 2009.Google ScholarGoogle Scholar
  6. H. Bast, A. Chitea, F. Suchanek, and I. Weber. ESTER: efficient search on text, entities, and relations. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 671--678, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. G. Conrad and M. H. Utt. A system for discovering re- lationships by feature extraction from text databases. In SIGIR '94: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, pages 260--270, New York, NY, USA, 1994. Springer-Verlag New York, Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. de Vries, A.-M. Vercoustre, J. A. Thom, N. Craswell, and M. Lalmas. Overview of the INEX 2007 entity ranking track. In INEX 2007, pages 245--251, Berlin, Heidelberg, 2008. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Demartini, C. S. Firan, T. Iofciu, R. Krestel, and W. Nejdl. "Why finding entities in wikipedia is difficult, sometimes. Information Retrieval", Special Issue on Focused Retrieval and Result Aggregation, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. Demartini, T. Iofciu, and A. de Vries. Overview of the inex 2009 entity ranking track. In INEX 2009 Workshop Pre-Proceedings, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Demartini, A. P. Vries, T. Iofciu, and J. Zhu. Overview of the INEX 2008 entity ranking track. In Advances in Focused Retrieval: 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, pages 243--252, Berlin, Heidelberg, 2009. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. Denoyer and P. Gallinari. The Wikipedia XML Corpus. SIGIR Forum, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. Fang, L. Si, Z. Yu, Y. Xian, and Y. Xu. Entity retrieval with hierarchical relevance model. In The Eighteenth Text REtrieval Conference (TREC 2009) Notebook, 2009.Google ScholarGoogle Scholar
  14. J. R. Finkel, T. Grenager, and C. Manning. Incorporating non-local information into information extraction systems by gibbs sampling. In Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics (ACL 2005), pages pp. 363--370, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. Götz and O. Suhre. Design and implementation of the UIMA common analysis system. IBM Syst. J., 43(3):476--489, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Hiemstra. Using Language Models for Information Retrieval. PhD thesis, Center for Telematics and Information Technology, University of Twente, 2001.Google ScholarGoogle Scholar
  17. R. Kaptein, M. Koolen, and J. Kamps. Using Wikipedia cat- egories for ad hoc search. In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and De- velopment in Information Retrieval. ACM Press, New York NY, USA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. Kasneci, F. M. Suchanek, G. Ifrim, M. Ramanath, and G. Weikum. NAGA: Searching and Ranking Knowledge. In 24th International Conference on Data Engineering (ICDE 2008). IEEE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. McCreadie, C. Macdonald, I. Ounis, J. Peng, and R. L. T. Santos. University of glasgow at TREC 2009: experiments with terrier. In The Eighteenth Text REtrieval Conference (TREC 2009) Notebook, 2009.Google ScholarGoogle Scholar
  20. E. Meij, P. Mika, and H. Zaragoza. An evaluation of entity and frequency based query completion methods. In SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 678--679, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Paşca. Weakly-supervised discovery of named entities using web search queries. In CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pages 683--690, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. Petkova and W. B. Croft. Proximity-based document representation for named entity retrieval. In CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pages 731--740, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. H. Raghavan, J. Allan, and A. Mccallum. An exploration of entity models, collective classification and relation description. In KDD'04, 2004.Google ScholarGoogle Scholar
  24. R. Schenkel, F. M. Suchanek, and G. Kasneci. Yawn: A semantically annotated wikipedia xml corpus. In BTW, pages 277--291, 2007.Google ScholarGoogle Scholar
  25. T. Strohman, D. Metzler, H. Turtle, and W. B. Croft. Indri: a language-model based search engine for complex queries. In Proceedings of the International Conference on Intelligent Analysis, 2005.Google ScholarGoogle Scholar
  26. T. Tsikrika, P. Serdyukov, H. Rode, T. Westerveld, R. Aly, D. Hiemstra, and A. P. de Vries. Structured document retrieval, multimedia retrieval, and entity ranking using PF/Tijah. In Focused Access to XML Documents, pages 306--320, 2007.Google ScholarGoogle Scholar
  27. D. Vallet and H. Zaragoza. Inferring the most important types of a query: a semantic approach. In SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 857--858, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A.-M. Vercoustre, J. Pehcevski, and J. A. Thom. Using wikipedia categories and links in entity ranking. In Focused Access to XML Documents, pages 321--335, 2007.Google ScholarGoogle Scholar
  29. A.-M. Vercoustre, J. A. Thom, and J. Pehcevski. Entity ranking in wikipedia. In SAC '08: Proceedings of the 2008 ACM symposium on Applied computing, pages 1101--1106, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. H. Zaragoza, H. Rode, P. Mika, J. Atserias, M. Ciaramita, and G. Attardi. Ranking very many typed entities on wikipedia. In CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pages 1015--1018, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Entity ranking using Wikipedia as a pivot

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management
      October 2010
      2036 pages
      ISBN:9781450300995
      DOI:10.1145/1871437

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 October 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader