Differences and Identities in Document Retrieval in an Annotation Environment

Bottoni, Paolo; Cuomo, Michele; Levialdi, Stefano; Panizzi, Emanuele; Passavanti, Marco; Trinchese, Rossella

doi:10.1007/978-3-540-75512-8_11

Differences and Identities in Document Retrieval in an Annotation Environment

Paolo Bottoni¹,
Michele Cuomo¹,
Stefano Levialdi¹,
Emanuele Panizzi¹,
Marco Passavanti¹ &
…
Rossella Trinchese¹

Conference paper

386 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4777))

Abstract

Digital annotation of web pages presents two types of problems which are unknown to traditional annotation and which are connected to the dynamicity and the openness of the Web. The first problem is related to the possibility of replicating a document over multiple sites, so that it can be retrieved over the Web at different URLs or with different queries. This poses the need to associate to a web page all the annotations pertaining to its content, even if they were created while accessing the same content under a different URL. The second problem is related to the dynamics of individual HTML pages that often consist of insertions, deletions or movement of page segments. Annotations related to portions of the page that have moved within the page itself should be retrieved and shown to the user. To reduce the impact of these phenomena on the usefulness of the annotation process, our annotation system madcow incorporates two algorithms which assess the identity of two pages under two different URLs, and the differences between two versions of a page under the same URL, taking the proper actions in order to retrieve all the pertaining annotations.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bottoni, P., Civica, R., Levialdi, S., Orso, L., Panizzi, E., Trinchese, R.: MADCOW: a Multimedia Digital Annotation System. In: AVI 2004, pp. 55–62. ACM Press, New York (2004)
Chapter Google Scholar
Bottoni, P., Levialdi, S., Panizzi, E., Pambuffetti, N., Trinchese, R.: Storing and retrieving multimedia web notes. IJCSE (to appear)
Google Scholar
Bottoni, P., Levialdi, S., Rizzo, P.: An analysis and case study of digital annotation. In: Bianchi-Berthouze, N. (ed.) DNIS 2003. LNCS, vol. 2822, pp. 216–230. Springer, Heidelberg (2003)
Google Scholar
Bottoni, P., Civica, R., Levialdi, S., Orso, L., Panizzi, E., Trinchese, R.: Storing and retrieving multimedia web notes. In: Bhalla, S. (ed.) DNIS 2005. LNCS, vol. 3433, pp. 119–137. Springer, Heidelberg (2005)
Google Scholar
Brin, S., Davis, J., García-Molina, H.: Copy detection mechanisms for digital documents. In: SIGMOD 1995, pp. 398–409. ACM Press, New York (1995)
Chapter Google Scholar
Broder, A.: On the resemblance and containment of documents. In: SEQUENCES 1997, vol. 00, page. 21. IEEE Computer Society Press, Los Alamitos, CA, USA (1997)
Google Scholar
Chowdhury, A., Frieder, O., Grossman, D., McCabe, M.C.: Collection statistics for fast duplicate document detection. ACM Trans. Inf. Syst. 20(2), 171–191 (2002)
Article Google Scholar
Manber, U.: Finding similar files in a large filesystem. In: 1994 Winter USENIX Technical Conference, pp. 1–10 (1994)
Google Scholar
Pugh, W., Henzinger, M.H.: Detecting duplicate and near-duplicate files. US Patent 6658423 (December 2003)
Google Scholar
Rabin, M.O.: Fingerprinting by random polynomials. Report TR-15-81, Center for research in computing technology, Harvard University (1981)
Google Scholar
Sanderson, M.: Duplicate detection in the Reuters collection. Technical Report TR-1997-5, Department of Computer Science, University of Glasgow (1997)
Google Scholar
Shivakumar, N., Garcia-Molina, H.: Scam: a copy detection mechanism for digital documents. In: Proc. International Conference on Theory and Practice of Digital Libraries (1995)
Google Scholar
Shivakumar, N., Garcia-Molina, H.: Building a scalable and accurate copy detection mechanism. In: DL 1996, pp. 160–168. ACM Press, New York (1996)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Rome “La Sapienza”, Via Salaria 113, 00198, Rome, Italy
Paolo Bottoni, Michele Cuomo, Stefano Levialdi, Emanuele Panizzi, Marco Passavanti & Rossella Trinchese

Authors

Paolo Bottoni
View author publications
You can also search for this author in PubMed Google Scholar
Michele Cuomo
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Levialdi
View author publications
You can also search for this author in PubMed Google Scholar
Emanuele Panizzi
View author publications
You can also search for this author in PubMed Google Scholar
Marco Passavanti
View author publications
You can also search for this author in PubMed Google Scholar
Rossella Trinchese
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Subhash Bhalla

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bottoni, P., Cuomo, M., Levialdi, S., Panizzi, E., Passavanti, M., Trinchese, R. (2007). Differences and Identities in Document Retrieval in an Annotation Environment. In: Bhalla, S. (eds) Databases in Networked Information Systems. DNIS 2007. Lecture Notes in Computer Science, vol 4777. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75512-8_11

Download citation

DOI: https://doi.org/10.1007/978-3-540-75512-8_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75511-1
Online ISBN: 978-3-540-75512-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics