Skip to main content

The TELLTALE dynamic hypertext environment: Approaches to scalability

  • Chapter
  • First Online:
Intelligent Hypertext (WIH 1994, WIH 1993)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1326))

Included in the following conference series:

Abstract

Methods and tools for finding documents relevant to a user's needs in document corpora can be found in the information retrieval, library science, and hypertext communities. Typically, these systems provide retrieval capabilities for fairly static corpora, their algorithms are dependent on the language for which they are written, e.g. English, and they don't perform well when presented with misspelled words or text that has been degraded by OCR (optical character recognition) techniques. In this chapter, we present the TELLTALE system. TELLTALE is a dynamic hypertext environment that provides full-text search from a hypertextstyle user interface for text corpora that may be garbled by OCR or transmission errors, and that may contain languages other than English by using several techniques based on n-grams (n character sequences of text). In this chapter, we identify methods and techniques that we have applied to the n-gram data structures. We also discuss algorithms that we used to enhance the scalability of the TELLTALE Dynamic Hypertext System.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. M. Aboud, C. Chrisment, R. Razouk, F. Sedes, and C. Soule-Dupuy. Querying a hypertext information retrieval system by the use of classification. Information Processing and Management, 29(3):387–396, 1990.

    Google Scholar 

  2. W. B. Cavnar. N-Gram-Based text filtering for TREC-2. In Donna Harman, editor, Proceedings of TREC-2: Text Retrieval Conference 2, Gaithersburg, MD, 1993. National Institute of Standards and Technology.

    Google Scholar 

  3. Jonathan Cohen. Highlights: Language-and domain-independent automatic indexing terms for abstracting. To appear in JASIS, 1995.

    Google Scholar 

  4. The Unicode Consortium. The Unicode Standard: World Wide Character Encoding. Addison-Wesley, Redwood City, CA, 1992.

    Google Scholar 

  5. W. B. Croft and R. Thompson. I 3R: A new approach to the design of document retrieval systems. Journal of the American Society for Information Science, 38:389–404, 1987.

    Google Scholar 

  6. W. B. Croft and H. Turtle. A retrieval model for incorporating hypertext links. In Hypertext '89 Proceedings, pages 213–224. ACM Press, November 1989. Pittsburgh, PA, Nov 5–8.

    Google Scholar 

  7. Donald B. Crouch, Carolyn J. Crouch, and Glenn Andreas. The use of cluster hierarchies in hypertext information retrieval. In Hypertext '89 Proceedings, pages 225–237. ACM Press, November 1989. Pittsburgh, PA, Nov 5–8.

    Google Scholar 

  8. Marc Damashek, 1995. U. S. Patent Number 5,418,951.

    Google Scholar 

  9. Marc Damashek. Gauging similarity with N-Grams: Language-independent categorization of text. Science, 267:843–848, 10 February 1995.

    Google Scholar 

  10. R. D'Amore and C. Mah. One-time complete indexing of text: theory and practice. In Proceedings 8th International ACM Conference on Research and Development in Information Retrieval. ACM Press, 1985.

    Google Scholar 

  11. The dp packagefor Tcl/Tk.Availablefor ftp from ftp://aud.alcatel.com/tcl/extensions/tcl-dp3.3bl.tar.gz.

    Google Scholar 

  12. Douglas C. Engelbart and W. K. English. A research center for augmenting human intellect. In Proceedings of the Fall Joint Computer Conference. AFIPS Press, Montvale, NY, 1968.

    Google Scholar 

  13. Mark E. Frisse and Steven B. Cousins. Information retrieval from hypertext: Update on the dynamic medical handbook project. In Hypertext '89 Proceedings. ACM Press, November 1989. Pittsburgh, PA, Nov 5–8.

    Google Scholar 

  14. Donna Harmon, editor. TREC-2-Text REtrieval Conference-2. National Institute of Standards and Technology, August 1993.

    Google Scholar 

  15. Donald E. Knuth. Sorting and Searching, pages 561–562. Addison Wesley, 1973.

    Google Scholar 

  16. Theodor H. Nelson. Managing immense storage. BYTE, 13(1):225–238, January 1988.

    Google Scholar 

  17. Jakob Nielsen. Hypertext and Hypermedia. Academic Press, San Diego, CA, 1990.

    Google Scholar 

  18. Claudia E. Pearce. A Dynamic Hypertext Environment Through n-gram Analysis. PhD thesis, University of Maryland Baltimore County, 1994.

    Google Scholar 

  19. Claudia E. Pearce. Dynamic hypertext links for highly degraded data in telltale. In Fourth Annual Symposium on Document Analysis and Information Retrieval, pages 89–106. Information Science Research Institute, University of Nevada Las Vegas, University of Nevada, 4505 Maryland Parkway, Box 454021, Las Vegas, Nevada 89154-4021, 1995.

    Google Scholar 

  20. Gerard Salton and Michael McGill. Introduction to Modern Information Retrieval. McGraw-Hill Book Company, 1983.

    Google Scholar 

  21. C. Y. Suen. n-gram statistics for natural language understanding and text processing. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1(2):164–172, 1979.

    Google Scholar 

  22. Brent B. Welch. Practical Programming in Tcl and Tk. Prentice-Hall, Inc., 1995.

    Google Scholar 

  23. P. Willette. Document retrieval experiments using indexing vocabularies of varying size. II. hashing, truncation, diagram and trigram encoding of index terms. Journal of Documentation, 35:296–305, December 1979.

    Google Scholar 

  24. Ian H. Witten, Alistair Moffat, and Timothy C. Bell. Managing Gigabytes. Van Nostrand Reinhold, 1994.

    Google Scholar 

  25. E. J. Yannakoudakis, P. Goyal, and J. A. Huggil. The generation and use of text fragments for data compression. Information Processing and Management, 18(1):15–21, 1982.

    Google Scholar 

  26. E. M. Zamora, J. J. Pollock, and A. Zamora. The use of trigram analysis for spelling error detection. Information Processing and Management, 17(6):305–316, 1981.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Charles Nicholas James Mayfield

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Pearce, C., Miller, E. (1997). The TELLTALE dynamic hypertext environment: Approaches to scalability. In: Nicholas, C., Mayfield, J. (eds) Intelligent Hypertext. WIH WIH 1994 1993. Lecture Notes in Computer Science, vol 1326. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0023962

Download citation

  • DOI: https://doi.org/10.1007/BFb0023962

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-63637-3

  • Online ISBN: 978-3-540-69622-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics