skip to main content
article
Free Access

On the construction of effective vocabularies for information retrieval

Published:04 November 1973Publication History
Skip Abstract Section

Abstract

Natural language query formulations exhibit advantages over artificial language statements since they permit the user to approach the retrieval environment without prior training and without using intermediaries. To obtain adequate retrieval output, it is however necessary to emphasize the good terms and to deemphasize the bad ones. The usefulness of the terms in a natural language vocabulary is first characterized in terms of their frequency distribution over the documents of a collection. The construction of "good" natural language vocabularies is then described, and methods are given for improving the vocabulary by transforming terms that operate poorly for retrieval purposes into better ones.

References

  1. H. P. Luhn, A Statistical Approach to Mechanized Encoding and Searching of Literary Information, IBM Journal of Research and Development, Vol. 1, No. 4, October 1957, p. 309--317.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. Sparck Jones, A Statistical Interpretation of Term Specificity and its Application to Retrieval, Journal of Documentation, Vol. 28, No. 1, March 1972, p. 11--20.Google ScholarGoogle ScholarCross RefCross Ref
  3. G. Salton and C. S. Yang, On the Specification of Term Values in Automatic Indexing, Computer Science Technical Report No. 73-173, Cornell University, June 1972, to appear in Journal of Documentation.Google ScholarGoogle Scholar
  4. K. Bonwit and J. Aste Tonsman, Negative Dictionaries, Scientific Report No. ISR-18, Section VI, Dept. of Computer Science, Cornell University, October 1970.Google ScholarGoogle Scholar
  5. G. Salton, Experiments in Automatic Thesaurus Construction for Information Retrieval, Information Processing-71, North Holland Publishing Co., Amsterdam, 1972, p. 115--123.Google ScholarGoogle Scholar
  6. S. Herzog and H. Kargman, Modification and Combination of Nondiscriminating Concepts in a Document Collection, Term Report CS 435, Dept. of Computer Science, Cornell University, Ithaca, May 1973.Google ScholarGoogle Scholar

Index Terms

  1. On the construction of effective vocabularies for information retrieval
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 10, Issue 1
        Proceedings of ACM SIGPLAN - SIGIR interface meeting
        January 1975
        182 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/951787
        Issue’s Table of Contents
        • cover image ACM Conferences
          SIGPLAN '73: Proceedings of the 1973 meeting on Programming languages and information retrieval
          November 1973
          190 pages
          ISBN:9781450374392
          DOI:10.1145/951762

        Copyright © 1973 Authors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 4 November 1973

        Check for updates

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader