Skip to main content
Log in

The AT&T Internet Difference Engine: Tracking and viewing changes on the web

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

The AT&T Internet Difference Engine (AIDE) is a system that finds and displays changes to pages on the World Wide Web. The system consists of several components, including a web‐crawler that detects changes, an archive of past versions of pages, a tool called HtmlDiff to highlight changes between versions of a page, and a graphical interface to view the relationship between pages over time. This paper describes AIDE, with an emphasis on the evolution of the system and experiences with it. It also raises some sociological and legal issues.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ball, T. and F. Douglis (1996), “An Internet Difference Engine and its Applications,” In Digest of Papers, COMPCON'96, IEEE Computer Society Press, Los Alamitos, CA, pp. 71–76.

    Google Scholar 

  • Chappell, D. (1996), Understanding ActiveX and OLE, Microsoft Press, Redmond, WA.

    Google Scholar 

  • Chen, P.P. (1976), “The Entity-Relationship Model - Toward a Unified View of Data,” ACM Transactions on Database Systems 1, 1, 9–36.

    Google Scholar 

  • Chen, Y.-F., G.S. Fowler, E. Koutsofios, and R.S. Wallach (1995), “Ciao: A Graphical Navigator for Software and Document Repositories,” In International Conference on Software Maintenance, IEEE Computer Society Press, Los Alamitos, CA, pp. 66–75. See also: http://www.research.att.com/~ciao

    Google Scholar 

  • Chen, Y.-F. and E. Koutsofios (1997), “WebCiao: A Website Visualization and Tracking System,” In Proceedings of WebNet97, AACE, Charlottesville, VA. An extended version appears as AT&T Labs Research TR 97.17.1 and is available via: http://www.research.att.com/~chen/webciao

    Google Scholar 

  • Dean, D., E. Felten, and D. Wallach (1996), “Java Security: From HotJava to Netscape and Beyond,” In Proceedings of the 1996 IEEE Symposium on Security and Privacy, IEEE Computer Society Press, Los Alamitos, CA, pp. 190–200.

    Google Scholar 

  • DejaNews (1996), “DejaNews.” http://www.dejanews.com/

  • DEC (Digital Equipment Corporation) (1997), “AltaVista.” http://www.altavista.digital.com

  • Random URL selection at: http://www.altavista.digital.com/cgi-bin/query?pg=s&target=0

  • Douglis, F. (1996), “Experiences with the AT&T Internet Difference Engine,” In Proceedings of the 22nd International Conference on Technology Management & Performance Evaluation of Enterprise-Wide Information System (CMG96), CMG, Turnesville, NJ, available on CD-ROM.

    Google Scholar 

  • Douglis, F. and T. Ball, (1996), “Tracking and Viewing Changes on the Web,” In Proceedings of 1996 USENIX Technical Conference, USENIX, Berkeley, CA, pp. 165–176.

    Google Scholar 

  • Douglis, F., T. Ball, Y.-F. Chen, and E. Koutsofios (1996), WebGUIDE: Querying and Navigating Changes in Web Repositories,” In Proceedings of the Fifth International World Wide Web Conference, Elsevier, Amsterdam, The Netherlands, pp. 1335–1344.

    Google Scholar 

  • Fielding, R., J. Gettys, J. Mogul, H. Frystyk, T. Berners-Lee, et al. (1997), “RFC 2068: Hypertext Transfer Protocol - HTTP/1.1.” http://globecom.net/ietf/rfc/rfc2068.shtml

  • First Floor Software (1997). http://www.firstfloor.com/

  • Fowler, G. (1994), “cql - A Flat File Database Query Language,” In Proceedings of the USENIX Winter 1994 Conference, USENIX, Berkeley, CA, pp. 11–21.

    Google Scholar 

  • Greer, R. (1994), “All About Daytona,” AT&T Bell Laboratories internal document, AT&T Bell Laboratories, Murray Hill, NJ.

    Google Scholar 

  • Gwertzman, J. and M. Seltzer (1996), “World-Wide Web Cache Consistency,” In Proceedings of 1996 USENIX Technical Conference, USENIX, Berkeley, CA, pp. 141–151. http://www.eecs.harvard.edu/~vino/web/usenix.196/

    Google Scholar 

  • Hirschberg, D.S. (1977), “Algorithms for the Longest Common Subsequence Problem,” Journal of the ACM 24, 4, 664–675.

    Google Scholar 

  • Hunt, J.W. and M.D. McIlroy (1975), “An Algorithm for Differential File Comparison,” Technical Report Computing Science TR #41, Bell Laboratories, Murray Hill, NJ.

    Google Scholar 

  • Informant (1996), “Informant.” http://informant.dartmouth.edu/

  • Javasoft (1996), “Java.” http://www.javasoft.com/

  • Kahle, B. (1997), “Preserving the Internet,” Scientific American 276, 3, 82–83. http://www.sciam.com/0397issue/0397kahle.html

    Google Scholar 

  • Marimba, Inc. (1997), “Castanet.” http://www.marimba.com/datasheets/castanet-ds.html

  • Milner, R., Ed. (1997), The Definition of Standard ML: Revised, MIT Press, Cambridge, MA.

    Google Scholar 

  • MKS (Mortice Kern Systems, Inc.) (1997), “Web Integrity.” http://www.mks.com/solution/ie/

  • Mogul, J., F. Douglis, A. Feldmann, and B. Krishnamurthy (1997), “Potential Benefits of Delta-encoding and Data Compression for HTTP,” In Proceedings of SIGCOMM'97, ACM Press, New York, NY, to appear. Currently available via: http://ftp.digital.com/~mogul/sigcomm97.ps.gz An extended version appears as Digital Equipment Corporation Western Research Lab TR 97/4, July, 1997.

    Google Scholar 

  • Netmind (1996), “URL-Minder.” http://www.netmind.com/URL-minder/URL-minder.html

  • Newbery, M. (1996), “Katipo.” http://www.vuw.ac.nz/~newbery/Katipo.html

  • Pointcast, Inc. (1997). http://www.pointcast.com/

  • Resnick, R. and J. Miller (1996), “PICS: Internet Access Controls Without Censorship,” Communications of the ACM 39, 10, 87–93.

    Google Scholar 

  • Rivest, R.L. (1992), “The MD5 Message-Digest Algorithm,” Internet Request for Comments, RFC 1321. http://sunsite.auc.dk/RFC/rfc/rfc1321.html

  • Robot (1995), “A Standard for Robot Exclusion.” http://web.nexor.co.uk/mak/doc/robots/norobots.html

  • Seltzer, M. and O. Yigit (1991), “A New Hashing Package for UNIX,” In USENIX Conference Proceedings, USENIX, Berkeley, CA, pp. 173–184.

    Google Scholar 

  • Surfbot (1997), “Surfbot.” http://www.surflogic.com/products.html/ formerly known as WebWatch.

  • Sweet, L. (1997), “Pushing it to the Limit,” ZD Internet Magazine, March 17. http://www8.zdnet.com/zdimag/content/anchors/199703/17/1.html

  • Tichy, W. (1985), “RCS: a System for Version Control,” Software - Practice & Experience 15, 7, 637–654.

    Google Scholar 

  • Virtual Library (1996a), “The World Wide Web Virtual Library.” http://www.w3.org/hypertext/DataSources/bySubject/Overview2.html

  • Virtual Library (1996b), “The World Wide Web Virtual Library on Mobile Computing.” http://snapple.cs.washington.edu/mobile/

  • Webcopy (1996), “WebCopy.” http://www.inf.utfsm.cl/~vparada/webcopy.html

  • Williams, S., M. Abrams, C.R. Standridge, G. Abdulla, and E.A. Fox (1996), “Removal Policies in Network Caches for World-Wide Web Documents,” In Proceedings of SIGCOMM'96, ACM Press, New York, pp. 293–305.

    Google Scholar 

  • Yahoo (1996), “Yahoo.” http://yahoo.com/

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Douglis, F., Ball, T., Chen, Y. et al. The AT&T Internet Difference Engine: Tracking and viewing changes on the web. World Wide Web 1, 27–44 (1998). https://doi.org/10.1023/A:1019243126596

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1019243126596

Keywords

Navigation