ABSTRACT
The importance of good weighting methods in information retrieval --- methods that stress the most useful features of a document or query representative --- is examined. Evidence is presented that good weighting methods are more important than the feature selection process and it is suggested that the two need to go hand-in-hand in order to be effective. The paper concludes with a method for learning a good weight for a term based upon the characteristics of that term.
- Buckley, C. and Salton, G. and Allan, J., "Automatic Retrieval With Locality Information Using SMART." Proceedings of the First TREC Conference, 1993.Google Scholar
- Fagan, J., Experiments in Automatic Phrase Indexing for Document Retrieval: A Comparison of Syntactic and Nonsyntactic Methods, Doctoral Dissertation, Cornell University, Report TR 87--868, Department of Computer Science, Ithaca, NY, 1987. Google ScholarDigital Library
- Fuhr, N., "Models for Retrieval with Probabilistic Indexing." Information Processing and Management 25(1), 1989, pp. 55--72. Google ScholarDigital Library
- Fuhr. N. and Buckley, C., "A Probabilistic Learning Approach for Document Indexing." ACM Transactions on Information Systems 9(3), 1991, pages 223--248. Google ScholarDigital Library
- Fuhr, N. and Buckley, C., "Optimizing Document Indexing and Search Term Weighting Based on Probabilistic Models" Proceedings of the First TREC Conference, 1993.Google Scholar
- Salton, G. and Buckley, C., "Term Weighting Approaches in Automatic Text Retrieval." Information Processing and Management 24(5), 1988, pages 513--523. Google ScholarDigital Library
- Salton, G. and Yang, C. S., "On the Specification of Term Values in Automatic Indexing." Journal of Documentation 29(4), 1973, pages 351--372.Google ScholarCross Ref
- The importance of proper weighting methods
Recommendations
Balancing between over-weighting and under-weighting in supervised term weighting
Show the importance of the trade-off between over-weighting and under-weighting.Propose a revision of add-one smoothing on delta smoothed idf (dsidf).Present three regularization techniques to reduce over-weighting.Propose a new supervised term ...
Three new feature weighting methods for text categorization
WISM'10: Proceedings of the 2010 international conference on Web information systems and miningFeature weighting is an important phase of text categorization, which computes the feature weight for each feature of documents. This paper proposes three new feature weighting methods for text categorization. In the first and second proposed methods, ...
Termset weighting by adapting term weighting schemes to utilize cardinality statistics for binary text categorization
This study proposes a novel scheme for termset weighting based on cardinality statistics. Specifically, termsets are evaluated by considering the number of apparent member terms. Based on a recently verified hypothesis that the occurrence of a subset of ...
Comments