ABSTRACT
PageRank has been known to be a successful algorithm in ranking web sources. In order to avoid the rank sink problem, PageRank assumes that a surfer, being in a page, jumps to a random page with a certain probability. In the standard PageRank algorithm, the jumping probabilities are assumed to be the same for all the pages, regardless of the page properties. This is not the case in the real world, since presumably a surfer would more likely follow the out-links of a high-quality hub page than follow the links of a low-quality one. In this poster, we propose a novel algorithm "Dirichlet PageRank" to address this problem by adapting exible jumping probabilities based on the number of out-links in a page. Empirical results on TREC data show that our method outperforms the standard PageRank algorithm.
- S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Computer Networks, 30(1-7):107--117, 1998. Google ScholarDigital Library
- D. Cai, X. He, J.-R. Wen, and W.-Y. Ma. Block-level link analysis. In SIGIR, pages 440--447, 2004. Google ScholarDigital Library
- J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46(5):604--632, 1999. Google ScholarDigital Library
- C. Zhai and J. D. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR, pages 334--342, 2001. Google ScholarDigital Library
Index Terms
- Dirichlet PageRank
Recommendations
Beyond PageRank: machine learning for static ranking
WWW '06: Proceedings of the 15th international conference on World Wide WebSince the publication of Brin and Page's paper on PageRank, many in the Web community have depended on PageRank for the static (query-independent) ordering of Web pages. We show that we can significantly outperform PageRank using features that are ...
Associated pagerank: improved pagerank measured by frequent term sets
VECIMS'09: Proceedings of the 2009 IEEE international conference on Virtual Environments, Human-Computer Interfaces and Measurement SystemsWeb search engines encounter many new challenges while the amount of information on the web increases rapidly. Web documents have been a main resource for various purposes, and people rely on search engines to retrieve the desired documents. This paper ...
Topic-sensitive PageRank
WWW '02: Proceedings of the 11th international conference on World Wide WebIn the original PageRank algorithm for improving the ranking of search-query results, a single PageRank vector is computed, using the link structure of the Web, to capture the relative "importance" of Web pages, independent of any particular search ...
Comments