Abstract
Natural language query formulations exhibit advantages over artificial language statements since they permit the user to approach the retrieval environment without prior training and without using intermediaries. To obtain adequate retrieval output, it is however necessary to emphasize the good terms and to deemphasize the bad ones. The usefulness of the terms in a natural language vocabulary is first characterized in terms of their frequency distribution over the documents of a collection. The construction of "good" natural language vocabularies is then described, and methods are given for improving the vocabulary by transforming terms that operate poorly for retrieval purposes into better ones.
- H. P. Luhn, A Statistical Approach to Mechanized Encoding and Searching of Literary Information, IBM Journal of Research and Development, Vol. 1, No. 4, October 1957, p. 309--317.Google ScholarDigital Library
- K. Sparck Jones, A Statistical Interpretation of Term Specificity and its Application to Retrieval, Journal of Documentation, Vol. 28, No. 1, March 1972, p. 11--20.Google ScholarCross Ref
- G. Salton and C. S. Yang, On the Specification of Term Values in Automatic Indexing, Computer Science Technical Report No. 73-173, Cornell University, June 1972, to appear in Journal of Documentation.Google Scholar
- K. Bonwit and J. Aste Tonsman, Negative Dictionaries, Scientific Report No. ISR-18, Section VI, Dept. of Computer Science, Cornell University, October 1970.Google Scholar
- G. Salton, Experiments in Automatic Thesaurus Construction for Information Retrieval, Information Processing-71, North Holland Publishing Co., Amsterdam, 1972, p. 115--123.Google Scholar
- S. Herzog and H. Kargman, Modification and Combination of Nondiscriminating Concepts in a Document Collection, Term Report CS 435, Dept. of Computer Science, Cornell University, Ithaca, May 1973.Google Scholar
Index Terms
- On the construction of effective vocabularies for information retrieval
Recommendations
On the construction of effective vocabularies for information retrieval
Proceedings of ACM SIGPLAN - SIGIR interface meetingNatural language query formulations exhibit advantages over artificial language statements since they permit the user to approach the retrieval environment without prior training and without using intermediaries. To obtain adequate retrieval output, it ...
On the construction of effective vocabularies for information retrieval
SIGPLAN '73: Proceedings of the 1973 meeting on Programming languages and information retrievalNatural language query formulations exhibit advantages over artificial language statements since they permit the user to approach the retrieval environment without prior training and without using intermediaries. To obtain adequate retrieval output, it ...
A Chinese dictionary construction algorithm for information retrieval
In this article we propose a method for constructing, from raw Chinese text, a statistics-based automatic dictionary. The method makes use of local statistical information (i.e., data within a document) to identify and discard repeated string patterns, ...
Comments