research-article

Entity ranking using Wikipedia as a pivot

Authors:
Rianne Kaptein

University of Amsterdam, Amsterdam, Netherlands

University of Amsterdam, Amsterdam, Netherlands
View Profile

,
Pavel Serdyukov

Delft University of Technology, Delft, Netherlands

Delft University of Technology, Delft, Netherlands
View Profile

,
Arjen De Vries

Delft University of Technology and Centrum Wiskunde & Informatica, Delft, Netherlands

Delft University of Technology and Centrum Wiskunde & Informatica, Delft, Netherlands
View Profile

,
Jaap Kamps

University of Amsterdam, Amsterdam, Netherlands

University of Amsterdam, Amsterdam, Netherlands
View Profile

CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge managementOctober 2010Pages 69–78https://doi.org/10.1145/1871437.1871451

Published:26 October 2010Publication History

CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management

Pages 69–78

ABSTRACT

In this paper we investigate the task of Entity Ranking on the Web. Searchers looking for entities are arguably better served by presenting a ranked list of entities directly, rather than a list of web pages with relevant but also potentially redundant information about these entities. Since entities are represented by their web homepages, a naive approach to entity ranking is to use standard text retrieval. Our experimental results clearly demonstrate that text retrieval is effective at finding relevant pages, but performs poorly at finding entities. Our proposal is to use Wikipedia as a pivot for finding entities on the Web, allowing us to reduce the hard web entity ranking problem to easier problem of Wikipedia entity ranking. Wikipedia allows us to properly identify entities and some of their characteristics, and Wikipedia's elaborate category structure allows us to get a handle on the entity's type.

Our main findings are the following. Our first finding is that, in principle, the problem of web entity ranking can be reduced to Wikipedia entity ranking. We found that the majority of entity ranking topics in our test collections can be answered using Wikipedia, and that with high precision relevant web entities corresponding to the Wikipedia entities can be found using Wikipedia's 'external links'. Our second finding is that we can exploit the structure of Wikipedia to improve entity ranking effectiveness. Entity types are valuable retrieval cues in Wikipedia. Automatically assigned entity types are effective, and almost as good as manually assigned types. Our third finding is that web entity retrieval can be significantly improved by using Wikipedia as a pivot. Both Wikipedia's external links and the enriched Wikipedia entities with additional links to homepages are significantly better at finding primary web homepages than anchor text retrieval, which in turn significantly improved over standard text retrieval.

References

A. Arampatzis and J. Kamps. A signal-to-noise approach to score normalization. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM 2009), pages 797--806. ACM Press, New York USA, 2009. Google ScholarDigital Library
K. Balog. People Search in the Enterprise. PhD thesis, University of Amsterdam, 2008.Google Scholar
K. Balog, M. Bron, and M. de Rijke. Category-based query modeling for entity search. In 32nd European Conference on Information Retrieval (ECIR 2010), pages 319--331. Springer, 2010. Google ScholarDigital Library
K. Balog and M. de Rijke. Determining expert profiles (with an application to expert finding). In Proceedings of the IJCAI '07, pages pages 2657--2662, 2007. Google ScholarDigital Library
K. Balog, A. de Vries, P. Serdyukov, P. Thomas, and T. West- erveld. Overview of the TREC 2009 entity track. In The Eighteenth Text REtrieval Conference (TREC 2009) Notebook. National Institute for Standards and Technology, 2009.Google Scholar
H. Bast, A. Chitea, F. Suchanek, and I. Weber. ESTER: efficient search on text, entities, and relations. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 671--678, 2007. Google ScholarDigital Library
J. G. Conrad and M. H. Utt. A system for discovering re- lationships by feature extraction from text databases. In SIGIR '94: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, pages 260--270, New York, NY, USA, 1994. Springer-Verlag New York, Inc. Google ScholarDigital Library
A. de Vries, A.-M. Vercoustre, J. A. Thom, N. Craswell, and M. Lalmas. Overview of the INEX 2007 entity ranking track. In INEX 2007, pages 245--251, Berlin, Heidelberg, 2008. Springer-Verlag. Google ScholarDigital Library
G. Demartini, C. S. Firan, T. Iofciu, R. Krestel, and W. Nejdl. "Why finding entities in wikipedia is difficult, sometimes. Information Retrieval", Special Issue on Focused Retrieval and Result Aggregation, 2010. Google ScholarDigital Library
G. Demartini, T. Iofciu, and A. de Vries. Overview of the inex 2009 entity ranking track. In INEX 2009 Workshop Pre-Proceedings, 2009. Google ScholarDigital Library
G. Demartini, A. P. Vries, T. Iofciu, and J. Zhu. Overview of the INEX 2008 entity ranking track. In Advances in Focused Retrieval: 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, pages 243--252, Berlin, Heidelberg, 2009. Springer-Verlag. Google ScholarDigital Library
L. Denoyer and P. Gallinari. The Wikipedia XML Corpus. SIGIR Forum, 2006. Google ScholarDigital Library
Y. Fang, L. Si, Z. Yu, Y. Xian, and Y. Xu. Entity retrieval with hierarchical relevance model. In The Eighteenth Text REtrieval Conference (TREC 2009) Notebook, 2009.Google Scholar
J. R. Finkel, T. Grenager, and C. Manning. Incorporating non-local information into information extraction systems by gibbs sampling. In Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics (ACL 2005), pages pp. 363--370, 2005. Google ScholarDigital Library
T. Götz and O. Suhre. Design and implementation of the UIMA common analysis system. IBM Syst. J., 43(3):476--489, 2004. Google ScholarDigital Library
D. Hiemstra. Using Language Models for Information Retrieval. PhD thesis, Center for Telematics and Information Technology, University of Twente, 2001.Google Scholar
R. Kaptein, M. Koolen, and J. Kamps. Using Wikipedia cat- egories for ad hoc search. In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and De- velopment in Information Retrieval. ACM Press, New York NY, USA, 2009. Google ScholarDigital Library
G. Kasneci, F. M. Suchanek, G. Ifrim, M. Ramanath, and G. Weikum. NAGA: Searching and Ranking Knowledge. In 24th International Conference on Data Engineering (ICDE 2008). IEEE, 2008. Google ScholarDigital Library
R. McCreadie, C. Macdonald, I. Ounis, J. Peng, and R. L. T. Santos. University of glasgow at TREC 2009: experiments with terrier. In The Eighteenth Text REtrieval Conference (TREC 2009) Notebook, 2009.Google Scholar
E. Meij, P. Mika, and H. Zaragoza. An evaluation of entity and frequency based query completion methods. In SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 678--679, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
M. Paşca. Weakly-supervised discovery of named entities using web search queries. In CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pages 683--690, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
D. Petkova and W. B. Croft. Proximity-based document representation for named entity retrieval. In CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pages 731--740, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
H. Raghavan, J. Allan, and A. Mccallum. An exploration of entity models, collective classification and relation description. In KDD'04, 2004.Google Scholar
R. Schenkel, F. M. Suchanek, and G. Kasneci. Yawn: A semantically annotated wikipedia xml corpus. In BTW, pages 277--291, 2007.Google Scholar
T. Strohman, D. Metzler, H. Turtle, and W. B. Croft. Indri: a language-model based search engine for complex queries. In Proceedings of the International Conference on Intelligent Analysis, 2005.Google Scholar
T. Tsikrika, P. Serdyukov, H. Rode, T. Westerveld, R. Aly, D. Hiemstra, and A. P. de Vries. Structured document retrieval, multimedia retrieval, and entity ranking using PF/Tijah. In Focused Access to XML Documents, pages 306--320, 2007.Google Scholar
D. Vallet and H. Zaragoza. Inferring the most important types of a query: a semantic approach. In SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 857--858, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
A.-M. Vercoustre, J. Pehcevski, and J. A. Thom. Using wikipedia categories and links in entity ranking. In Focused Access to XML Documents, pages 321--335, 2007.Google Scholar
A.-M. Vercoustre, J. A. Thom, and J. Pehcevski. Entity ranking in wikipedia. In SAC '08: Proceedings of the 2008 ACM symposium on Applied computing, pages 1101--1106, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
H. Zaragoza, H. Rode, P. Mika, J. Atserias, M. Ciaramita, and G. Attardi. Ranking very many typed entities on wikipedia. In CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pages 1015--1018, New York, NY, USA, 2007. ACM. Google ScholarDigital Library

Index Terms

Entity ranking using Wikipedia as a pivot
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results

Recommendations

Entity ranking in Wikipedia
SAC '08: Proceedings of the 2008 ACM symposium on Applied computing

The traditional entity extraction problem lies in the ability of extracting named entities from plain text using natural language processing techniques and intensive training from large document collections. Examples of named entities include ...
Read More
Two-stage approach to named entity recognition using Wikipedia and DBpedia
IMCOM '17: Proceedings of the 11th International Conference on Ubiquitous Information Management and Communication

In natural language understanding, extraction of named entity (NE) mentions in given text and classification of the mentions into pre-defined NE types are important processes. Most NE recognition (NER) relies on resources such as a training corpus or NE ...
Read More
Evaluating Entity Linking with Wikipedia

Named Entity Linking (nel) grounds entity mentions to their corresponding node in a Knowledge Base (kb). Recently, a number of systems have been proposed for linking entity mentions in text to Wikipedia pages. Such systems typically search for candidate ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management
October 2010
2036 pages
ISBN:9781450300995
DOI:10.1145/1871437
General Chair:
Jimmy Huang
York University, Canada
,
Program Chairs:
Nick Koudas
University of Toronto, Canada
,
Gareth Jones
Dublin City University, Ireland
,
Xindong Wu
University of Vermont, USA
,
Kevyn Collins-Thompson
Microsoft Research, USA
,
Aijun An
York University, Canada
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 October 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
web entity ranking
wikipedia
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 53
  Total Citations
  View Citations
- 566
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Entity ranking using Wikipedia as a pivot

CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Entity ranking in Wikipedia

Two-stage approach to named entity recognition using Wikipedia and DBpedia

Evaluating Entity Linking with Wikipedia