Abstract
The purpose of a search engine is to retrieve from a given textual collection the documents deemed relevant for a user query. Typically a user query is modeled as a set of keywords, and a document is a Web page, a pdf file or whichever file can be parsed into a set of tokens (words). Documents are ranked in a flat list according to some measure of relevance to the user query. That list contains hyperlinks to the relevant documents, their titles, and also the so called (page or web) snippets, namely document excerpts allowing the user to understand if a document is indeed relevant without accessing it.
Partially supported by the Italian MIUR projects ALINWEB and ECD, and by the Italian Registry of ccTLD.it.
Chapter PDF
Similar content being viewed by others
References
CNN.com. Better search results than Google? Next-generation sites help narrow internet searches. Associated Press (January 2004)
Fung, B., Wang, K., Ester, M.: Large hierarchical document clustering using frequent itemsets. In: SIAM International Conference on Data Mining (2003)
Giannotti, F., Nanni, M., Pedreschi, D., Samaritani, F.: Webcat: Automatic categorization of web search results. In: SEBD, pp. 507–518 (2003)
Kummamuru, K., Lotlikar, R., Roy, S., Singal, K., Krishnapuram, R.: A hierarchical monothetic document clustering algorithm for summarization and browsing search results. In: WWW (2004)
Lawrie, D.J., Croft, W.B.: Generating hiearchical summaries for web searches. In: ACM SIGIR, pp. 457–458 (2003)
Maarek, Y.S., Fagin, R., Ben-Shaul, I.Z., Pelleg, D.: Ephemeral document clustering for web applications. Technical Report RJ 10186, IBM Research (2000)
Weiss, D., Stefanowski, J.: Web search results clustering in polish: Experimental evaluation of Carrot. In: New Trends in I.I.P. and Web Mining Conference (2003)
Zamir, O., Etzioni, O.: Grouper: a dynamic clustering interface to Web search results. Computer Networks 31, 1361–1374 (1999)
Zeng, H., He, Q., Chen, Z., Ma, W.: Learning to cluster web search results. In: ACM SIGIR (2004)
Zhang, D., Dong, Y.: Semantic, hierarchical, online clustering of web search results. In: WIDM (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ferragina, P., Gullì, A. (2004). The Anatomy of SnakeT: A Hierarchical Clustering Engine for Web-Page Snippets. In: Boulicaut, JF., Esposito, F., Giannotti, F., Pedreschi, D. (eds) Knowledge Discovery in Databases: PKDD 2004. PKDD 2004. Lecture Notes in Computer Science(), vol 3202. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30116-5_48
Download citation
DOI: https://doi.org/10.1007/978-3-540-30116-5_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23108-0
Online ISBN: 978-3-540-30116-5
eBook Packages: Springer Book Archive