Abstract
In response to a query, a search engine returns a ranked list of documents. If the query is about a popular topic (i.e., it matches many documents), then the returned list is usually too long to view fully. Studies show that users usually look at only the top 10 to 20 results. However, we can exploit the fact that the best targets for popular topics are usually linked to by enthusiasts in the same domain. In this paper, we propose a novel ranking scheme for popular topics that places the most authoritative pages on the query topic at the top of the ranking. Our algorithm operates on a special index of "expert documents." These are a subset of the pages on the WWW identified as directories of links to non-affiliated sources on specific topics. Results are ranked based on the match between the query and relevant descriptive text for hyperlinks on expert pages pointing to a given result page. We present a prototype search engine that implements our ranking scheme and discuss its performance. With a relatively small (2.5 million page) expert index, our algorithm was able to perform comparably on popular queries with the best of the mainstream search engines.
- BHARAT,K.AND HENZINGER, M. R. 1998. Improved algorithms for topic distillation in a hyperlinked environment. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (August 1998). 104- 111. Google Scholar
- BRIN,S.AND PAGE, L. 1998. The anatomy of a large-scale hypertextual web search engine. Proceedings of the 7th International World Wide Web Conference (April). Google Scholar
- CHAKRABARTI, S., DOM, B., RAGHAVAN, P., RAJAGOPALAN, S., GIBSON,D.,AND KLEINBERG, J. 1998. Automatic resource compilation by analyzing hyperlink structure and associated text. Comput. Netw. ISDN Syst. 30, 1-7, 65-74. Google Scholar
- CHAKRABARTI,S.,VAN DEN BERG, M., AND DOM, B. 1999. Focused crawling: A new approach to topicspecific web resource discovery. In Proceedings of the 8th World Wide Web Conference (Toronto, May 1999). Google Scholar
- KLEINBERG, J. M. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 5, 604- 632. Google Scholar
- LEMPEL,R.AND MORAN, S. 2000. The stochastic approach for link-structure analysis (SALSA) and the TKC effect. In Proceedings of the WWW9 Conference (Amsterdam, May), 387-401. Google Scholar
- MCBRYAN, O. A. 1994. GENVL and WWWW: Tools for Taming the Web. In O. NIERSTARSZ Ed., Proceedings of the first International World Wide Web Conference (CERN, Geneva, May), 79-90.Google Scholar
Index Terms
- When experts agree: using non-affiliated experts to rank popular topics
Recommendations
Authority Rankings from HITS, PageRank, and SALSA: Existence, Uniqueness, and Effect of Initialization
Algorithms such as Kleinberg's HITS algorithm, the PageRank algorithm of Brin and Page, and the SALSA algorithm of Lempel and Moran use the link structure of a network of web pages to assign weights to each page in the network. The weights can then be ...
Searching the Web
We offer an overview of current Web search engine design. After introducing a generic search engine architecture, we examine each engine component in turn. We cover crawling, local Web page storage, indexing, and the use of link analysis for boosting ...
COLINA: a method for ranking SPARQL query results through content and link analysis
ISWC-PD'14: Proceedings of the 2014 International Conference on Posters & Demonstrations Track - Volume 1272The growing amount of Linked Data increases the importance of semantic search engines for retrieving information. Users often examine the first few results among all returned results. Therefore, using an appropriate ranking algorithm has a great effect ...
Comments