Abstract
Early search engines retrieved relevant pages for the user based primarily on the content similarity of the user query and the indexed pages of the search engines. The retrieval and ranking algorithms were simply direct implementation of those from information retrieval. Starting from 1996, it became clear that content similarity alone was no longer sufficient for search due to two reasons. First, the number of Web pages grew rapidly during the middle to late 1990s. Given any query, the number of relevant pages can be huge. For example, given the search query “classification technique”, the Google search engine estimates that there are about 10 million relevant pages. This abundance of information causes a major problem for ranking, i.e., how to choose only 30–40 pages and rank them suitably to present to the user. Second, content similarity methods are easily spammed. A page owner can repeat some important words and add many remotely related words in his/her pages to boost the rankings of the pages and/or to make the pages relevant to a large number of possible queries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
(2007). Link Analysis. In: Web Data Mining. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-37882-2_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-37882-2_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37881-5
Online ISBN: 978-3-540-37882-2
eBook Packages: Computer ScienceComputer Science (R0)