Adding support for dynamic and focused search with Fetuccino
Section snippets
Introduction and motivation
The Web keeps growing at a phenomenal rate. From the perspective of search technology, this growth has two important characteristics. First, the `update' policy is totally uncontrolled, with millions of users creating, modifying and deleting content at will, and linking it to other content in an unstructured manner. Second, Web growth is increasingly fueled by the addition of dynamically and automatically generated content [18].
These characteristics impact the major criteria of evaluation for
The approach
Dynamic search is primarily distinguished from conventional static search in that the former involves fetching the actual documents at the time the query is issued and analyzing their relevance to the search query on the fly, while the latter is based on evaluating the query against pre-computed repositories. Obviously, dynamic search cannot be employed `from scratch' because of the high cost of text analysis and the huge search space, and therefore it requires a starting point for recursive
A two phase approach
A major problem when conducting searches over a large heterogeneous and uncontrolled document set such as the Web is with the quality of the results for ambiguous queries. Indeed, it happens quite often that the user discovers only after issuing a search that the query expression s/he picked has a different interpretation in a totally irrelevant domain. This might happen even if the user is an expert at expressing clear and precise queries, simply because of the large scope of the Web. This
Conclusion and future work
The accelerated growth of the Web might cause traditional purely static search engines to become less accurate and less effective over the time, even if their indexing, retrieval, and storage techniques are likely to improve. Fetuccino addresses this shortcoming by augmenting static search services with text-based dynamic exploration around the vicinity of the search results, thereby discovering new relevant information and validating old information. The key to making an effective use of the
Acknowledgements
We thank Jon Kleinberg12 and Ron Fagin for useful discussions on Hubs and Authorities (and Ron Pinter for putting us in contact). We are also grateful to Prabhakar Raghavan for letting us use the Clever system. Finally, we are in debt to Dirk Nicol for hosting Mapuccino and Fetuccino on the IBM Corporate Java Site. This research was done while Dan Pelleg was an extern student at the IBM Haifa Research Laboratory.
Issy Ben-Shaul is a faculty member in the department of Electrical Engineering at the Technion — Israel Institute of Technology, and a consultant in the Information Retrieval and Organization Group at the IBM Haifa Research Lab. He received his BSc in Mathematics and Computer Science from Tel Aviv University in 1988, and his MS and PhD in Computer Science from Columbia University in 1991 and 1995, respectively. During 1995, before joining the Technion, Ben-Shaul was a research staff member at
References (22)
- K. Bharat, A. Broder, M. Henzinger, P. Kumar and S. Venkatasubramanian, The connectivity server: fast access to linkage...
- S. Brin and L. Page, The anatomy of a large-scale hypertextual Web search engine, in: Proc. 7th International World...
- J. Carriere and R. Katzman, WebQuery: searching and visualizing the Web through connectivity, in: Proc. of the 6th...
- S. Chakrabarti, B. Dom, D. Gibson, S.R. Kumar, P. Raghavan, S. Rajagopalan and A. Tomkins, Experiments in topic...
- S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, P. Raghavan and S. Rajagopalan, Automatic resource list compilation by...
- D.R. Cutting, D.R. Karger, J.O. Pedersen and J.W. Tukey, Scatter/gather: a cluster-based approach to browsing large...
- P. De Bra, G.-J. Houben, Y. Kornatzky and R. Post, Information retrieval in distributed hypertexts, in: Proc. of...
- D. Gibson, J. Kleinberg and P. Raghavan, Inferring Web communities from link topology, in: Proc. of the 9th ACM...
- M. Herscovici, M. Jacovi, Y.S. Maarek, D. Pelleg, M. Shtalhaim and S. Ur, The shark-search algorithm: an application:...
- J. Kleinberg, Authoritative sources in a hyperlinked environment, in: Proc. of the 9th ACM-SIAM Symposium on Discrete...
Cited by (11)
Complementing search engines with online web mining agents
2003, Decision Support SystemsBee hive at work: Story tracking case study
2009, Proceedings - 2009 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Workshops, WI-IAT Workshops 2009Query graph visualizer: A visual collaborative querying system
2008, 1st International Conference on the Applications of Digital Information and Web Technologies, ICADIWT 2008Visualization of Web Spaces: State of the Art and Future Directions
2007, Data Base for Advances in Information SystemsLearning to crawl: Comparing classification schemes
2005, ACM Transactions on Information SystemsA general evaluation framework for topical crawlers
2005, Information Retrieval
Issy Ben-Shaul is a faculty member in the department of Electrical Engineering at the Technion — Israel Institute of Technology, and a consultant in the Information Retrieval and Organization Group at the IBM Haifa Research Lab. He received his BSc in Mathematics and Computer Science from Tel Aviv University in 1988, and his MS and PhD in Computer Science from Columbia University in 1991 and 1995, respectively. During 1995, before joining the Technion, Ben-Shaul was a research staff member at the IBM research laboratory in Haifa, and worked on applications and extensions of clustering technology to the Internet. He is leading the Distributed Systems Group and the associated software systems laboratory at the Technion. His research interests include distributed and mobile systems, software engineering, information retrieval, Web, advanced transactions, workflow management systems and electronic commerce. He has published over 30 papers in refereed journals and conference
Michael Herscovici is a Research Staff Member at the IBM Haifa Research Lab in Haifa, Israel and belongs to the `Information Retrieval and Organization' Group. His research interests include Internet applications and parsing techniques. Mr. Herscovici received his B.Sc. in Computer Science from the Technion, Israel Institute of Technology in Haifa, in 1998. He joined IBM in 1997 and has since worked on the dedicated robot component of Mapuccino, a Web site mapping tool.
Michal Jacovi is a Research Staff Member at the IBM Haifa Research Lab in Haifa, Israel, and belongs to the `Information Retrieval and Organization' Group. Her research interests include Internet applications, user interfaces, and visualization. She received her M.Sc. in Computer Science from the Technion, Haifa, Israel, in 1993. Ms. Jacovi has joined IBM in 1993, and worked on several projects involving user interfaces and Object Oriented, some of which have been published in journals and conferences. Since the emergence of Java, she has been involved in the conception and implementation of Mapuccino, a Web site mapping tool, written in Java, that is being integrated into several IBM products.
Yoelle S. Maarek is a Research Staff Member at the IBM Haifa Research Lab in Haifa, Israel and manages the `Information Retrieval and Organization' Group that counts about 15 members. Her research interests include information retrieval, Internet applications, and software reuse. She graduated from the `Ecole Nationale des Ponts et Chaussees', Paris, France, as well as received her D.E.A (graduate degree) in Computer Science from Paris VI University in 1985. She received a Doctor of Science degree from the Technion, Haifa, Israel, in January 1989. Before joining IBM Israel, Dr Maarek was a research staff member at the IBM T.J. Watson Research Center for about 5 years. She serves on the program committees of several international conference and is a member of the Review Board of the WebNet Journal. She has published over 25 papers in refereed journals and conferences.
Dan Pelleg received his B.A. in 1995 and his MSc in 1998 from the Department of Computer Science, Technion, Haifa, Israel. His Master's thesis topic was `Phylogeny Approximation via Semidefinite Programming'. He is currently a PhD candidate in the CS Dept. at Carnegie-Mellon University. His research interests include computational biology, combinatorial optimization and Web-based software agents. During the summers of 1997 and 1998, Dan worked as an extern student in IBM Haifa Research Laboratory.
Menachem Shtalhaim is a Research Staff Member at the IBM Haifa Research Lab in Haifa, Israel and belongs to the `Information Retrieval and Organization' Group. His research interests include Internet applications, communication protocols and heuristic algorithms. Mr. Shtalhaim joined IBM in 1993, and worked on several projects involving morphological analysis tools, Network Design and analysis tool (IBM product NetDA/2) and the AS400 logical file system layer. In the past, Mr. Shtalhaim has worked on medical diagnostic systems. He is the author of the dedicated robot component of Mapuccino, a Web site mapping tool
Vladimir Soroka is a Research Staff Member at the IBM Haifa Research Lab in Haifa, Israel and belongs to the `Information Retrieval and Organization' Group. His research interests include Internet applications and Information organization. Mr. Soroka received his B.Sc. in Computer Science from the Technion, Israel Institute of Technology in Haifa, Israel in 1996. Before joining IBM, Mr. Soroka worked on Internet-based fax servers. He joined IBM in 1998 and has since worked on various applications such as Mapuccino, a Web site mapping tool.
Sigalit Ur is a Research Staff Member at the IBM Haifa Research Lab in Haifa, Israel, working on Mapuccino, a Web site mapping tool, written in Java, that is being integrated into several IBM products. She received a Master of Science degree in Intelligent Systems from the University of Pittsburgh in 1993. Before joining IBM, Ms. Ur was involved in projects in a wide variety of fields, including data processing, databases, cognitive science, multi-agent planning and image processing, some of which have been published in journals and conferences.