ABSTRACT
Information filtering systems based on statistical retrieval models usually compute a numeric score indicating how well each document matches each profile. Documents with scores above profile-specificdissemination thresholdsare delivered.
An optimal dissemination threshold is one that maximizes a given utility function based on the distributions of the scores of relevant and non-relevant documents. The parameters of the distribution can be estimated using relevance information, but relevance information obtained while filtering isbiased. This paper presents a new method of adjusting dissemination thresholds that explicitly models and compensates for this bias. The new algorithm, which is based on the Maximum Likelihood principle, jointly estimates the parameters of the density distributions for relevant and non-relevant documents and the ratio of the relevant document in the corpus. Experiments with TREC-8 and TREC-9 Filtering Track data demonstrate the effectiveness of the algorithm.
- 1.J. Allan. 1996. Incremental relevance feedback for information filtering. In Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 270-278. ACM. Google ScholarDigital Library
- 2.A. Arampatzis, J. Beney, C.H.A. Koster, and T.P. van der Weide. 2001. KUN on the TREC-9 Filtering Track: Incrementality, decay, and threshold optimization for adaptive filtering systems. In Proceeding of Ninth Text REtrieval Conference (TREC-9). National Institute of Standards and Technology, Special Publication.Google Scholar
- 3.J. Broglio, J.P. Callan, W.B. Croft, and D.W. Nachbar. 1995. Document retrieval and routing using the INQUERY system. In Proceeding of Third Text REtrieval Conference (TREC-3), pp. 29-38. National Institute of Standards and Technology, Special Publication 500-225.Google Scholar
- 4.J. Callan. 1996. Document filtering with inference networks. In Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 262- 269. ACM. Google ScholarDigital Library
- 5.J. Callan. 1998. Learning while filtering. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 224-231. ACM Press. Google ScholarDigital Library
- 6.D A. Hull, and S. E. Robertson. 1999. The TREC-8 Filtering Track final report. In Proceeding of the Eighth Text REtrieval Conference (TREC-8), pp. 35- 56. National Institute of Standards and Technology, Special Publication 500-246.Google Scholar
- 7.Y.H. Kim, S.Y. Hahn, and B.T. Zhang. 2000. Text filtering by boosting Naive Bayes classifiers. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 168-175. ACM Press. Google ScholarDigital Library
- 8.R.D. Lyer, D.D. Lewis, R.E. Schapire, Y. Singer, and A. Singhal. 2000. Boosting for document routing. In Proceedings of the Ninth International Conference on Information Knowledge Management (CIKM 2000), pp. 70-77. ACM Press. Google ScholarDigital Library
- 9.M. F. Porter. 1980. An algorithm for suffix stripping. Program, 14 (3), pp. 130-137.Google ScholarCross Ref
- 10.W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery. 1992. Numerical Recipes in C: The Art of Scientific Computing, pp. 420-425. Cambridge University Press. Google Scholar
- 11.W. Hersh, C. Buckley, T. J. Leone and D. Hickam 1994. OHSUMED: An interactive retrieval evaluation and new large test collection for research. In Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 192-201. ACM Press Google ScholarDigital Library
- 12.R. Manmatha, T. Rath, and F. Feng. 2001. Modeling score distributions for combining the outputs of search engines. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval , New Orleans, LA. ACM. Google ScholarDigital Library
- 13.J. J. Rocchio. 1971. Relevance feedback in information retrieval. In The SMART Retrieval System - Experiments In Automatic Document Processing, pp. 313-323. Prentice Hall.Google Scholar
- 14.S. E. Robertson, and D. A. Hull. 2001. Guidelines for the TREC-9 Filtering Track. Proceeding of Ninth Text REtrieval Conference (TREC-9). National Institute of Standards and Technology, Special Publication.Google Scholar
- 15.S.E. Robertson, and S. Walker. 2001. Microsoft Cambridge at TREC-9: Filtering track. In Proceeding of Ninth Text REtrieval Conference (TREC-9). National Institute of Standards and Technology, Special Publication.Google Scholar
- 16.S.E. Robertson. In press. Threshold setting in adaptive filtering. Journal of Documentation.Google Scholar
- 17.S. E. Robertson, S. Walker, M. M. Beaulieu, and M. Gatford, A. Payne. 1995. Okapi at TREC-4. In Proceeding of Fourth Text REtrieval Conference (TREC-4). National Institute of Standards and Technology, Special Publication 500-236.Google Scholar
- 18.R.E. Schapire, Y. Singer, and A. Singhal. 1998. Boosting and Rocchio applied to text filtering. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 215-223. ACM. Google ScholarDigital Library
- 19.T.Ault, and Y. Yang. 2001. kNN at TREC-9: A failure analysis. In Proceeding of Ninth Text REtrieval Conference (TREC-9). National Institute of Standards and Technology, Special Publication.Google Scholar
- 20.C. Zhai, P. Jansen, and E. Stoica. 1998. Threshold calibration in CLARIT adaptive filtering. In Proceeding of Seventh Text REtrieval Conference (TREC-7), pp. 149-156. National Institute of Standards and Technology, Special Publication 500-242.Google Scholar
- 21.C. Zhai, P. Jansen, N. Roma, E. Stoica, and D.A. Evans. 1999. Optimization in CLARIT adaptive filtering. In Proceeding of the Eighth Text REtrieval Conference (TREC-8), pp. 253-258. National Institute of Standards and Technology, Special Publication 500- 246.Google Scholar
- 22.Y. Zhang, and J. Callan. YFilter at TREC9. 2001. In Proceeding of Ninth Text REtrieval Conference (TREC-9). National Institute of Standards and Technology, Special Publication.Google Scholar
- 23.http://wol.ra.phy.cam.ac.uk/mackay/c/macopt.htmlGoogle Scholar
- 24.http://www.ccr.buffalo.edu/class-notes/hpc2- 00/odes/node4.htmlGoogle Scholar
Index Terms
- Maximum likelihood estimation for filtering thresholds
Recommendations
Simultaneous estimation based on empirical likelihood and general maximum likelihood estimation
One typical problem in simultaneous estimation of mean values is estimating means of normal distributions, however when normality or any other distribution is not specified, more robust estimation procedures are demanded. A new estimation procedure is ...
Mean likelihood frequency estimation
Estimation of signals with nonlinear as well as linear parameters in noise is studied. Maximum likelihood estimation has been shown to perform the best among all the methods. In such problems, joint maximum likelihood estimation of the unknown ...
Maximum likelihood estimation, analysis, and applications of exponential polynomial signals
We model complex signals by approximating the phase and the logarithm of the time-varying amplitude of the signal as a finite order polynomial. We refer to a signal that has this form as an exponential polynomial signal (EPS). We derive an iterative ...
Comments