Article

Maximum likelihood estimation for filtering thresholds

Authors:
Yi Zhang

Carnegie Mellon Univ., Pittsburgh, PA

Carnegie Mellon Univ., Pittsburgh, PA
View Profile

,
Jamie Callan

Carnegie Mellon Univ., Pittsburgh, PA

Carnegie Mellon Univ., Pittsburgh, PA
View Profile

SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrievalSeptember 2001Pages 294–302https://doi.org/10.1145/383952.384012

Published:01 September 2001Publication History

SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 294–302

ABSTRACT

Information filtering systems based on statistical retrieval models usually compute a numeric score indicating how well each document matches each profile. Documents with scores above profile-specificdissemination thresholdsare delivered.

An optimal dissemination threshold is one that maximizes a given utility function based on the distributions of the scores of relevant and non-relevant documents. The parameters of the distribution can be estimated using relevance information, but relevance information obtained while filtering isbiased. This paper presents a new method of adjusting dissemination thresholds that explicitly models and compensates for this bias. The new algorithm, which is based on the Maximum Likelihood principle, jointly estimates the parameters of the density distributions for relevant and non-relevant documents and the ratio of the relevant document in the corpus. Experiments with TREC-8 and TREC-9 Filtering Track data demonstrate the effectiveness of the algorithm.

References

1.J. Allan. 1996. Incremental relevance feedback for information filtering. In Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 270-278. ACM. Google ScholarDigital Library
2.A. Arampatzis, J. Beney, C.H.A. Koster, and T.P. van der Weide. 2001. KUN on the TREC-9 Filtering Track: Incrementality, decay, and threshold optimization for adaptive filtering systems. In Proceeding of Ninth Text REtrieval Conference (TREC-9). National Institute of Standards and Technology, Special Publication.Google Scholar
3.J. Broglio, J.P. Callan, W.B. Croft, and D.W. Nachbar. 1995. Document retrieval and routing using the INQUERY system. In Proceeding of Third Text REtrieval Conference (TREC-3), pp. 29-38. National Institute of Standards and Technology, Special Publication 500-225.Google Scholar
4.J. Callan. 1996. Document filtering with inference networks. In Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 262- 269. ACM. Google ScholarDigital Library
5.J. Callan. 1998. Learning while filtering. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 224-231. ACM Press. Google ScholarDigital Library
6.D A. Hull, and S. E. Robertson. 1999. The TREC-8 Filtering Track final report. In Proceeding of the Eighth Text REtrieval Conference (TREC-8), pp. 35- 56. National Institute of Standards and Technology, Special Publication 500-246.Google Scholar
7.Y.H. Kim, S.Y. Hahn, and B.T. Zhang. 2000. Text filtering by boosting Naive Bayes classifiers. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 168-175. ACM Press. Google ScholarDigital Library
8.R.D. Lyer, D.D. Lewis, R.E. Schapire, Y. Singer, and A. Singhal. 2000. Boosting for document routing. In Proceedings of the Ninth International Conference on Information Knowledge Management (CIKM 2000), pp. 70-77. ACM Press. Google ScholarDigital Library
9.M. F. Porter. 1980. An algorithm for suffix stripping. Program, 14 (3), pp. 130-137.Google ScholarCross Ref
10.W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery. 1992. Numerical Recipes in C: The Art of Scientific Computing, pp. 420-425. Cambridge University Press. Google Scholar
11.W. Hersh, C. Buckley, T. J. Leone and D. Hickam 1994. OHSUMED: An interactive retrieval evaluation and new large test collection for research. In Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 192-201. ACM Press Google ScholarDigital Library
12.R. Manmatha, T. Rath, and F. Feng. 2001. Modeling score distributions for combining the outputs of search engines. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval , New Orleans, LA. ACM. Google ScholarDigital Library
13.J. J. Rocchio. 1971. Relevance feedback in information retrieval. In The SMART Retrieval System - Experiments In Automatic Document Processing, pp. 313-323. Prentice Hall.Google Scholar
14.S. E. Robertson, and D. A. Hull. 2001. Guidelines for the TREC-9 Filtering Track. Proceeding of Ninth Text REtrieval Conference (TREC-9). National Institute of Standards and Technology, Special Publication.Google Scholar
15.S.E. Robertson, and S. Walker. 2001. Microsoft Cambridge at TREC-9: Filtering track. In Proceeding of Ninth Text REtrieval Conference (TREC-9). National Institute of Standards and Technology, Special Publication.Google Scholar
16.S.E. Robertson. In press. Threshold setting in adaptive filtering. Journal of Documentation.Google Scholar
17.S. E. Robertson, S. Walker, M. M. Beaulieu, and M. Gatford, A. Payne. 1995. Okapi at TREC-4. In Proceeding of Fourth Text REtrieval Conference (TREC-4). National Institute of Standards and Technology, Special Publication 500-236.Google Scholar
18.R.E. Schapire, Y. Singer, and A. Singhal. 1998. Boosting and Rocchio applied to text filtering. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 215-223. ACM. Google ScholarDigital Library
19.T.Ault, and Y. Yang. 2001. kNN at TREC-9: A failure analysis. In Proceeding of Ninth Text REtrieval Conference (TREC-9). National Institute of Standards and Technology, Special Publication.Google Scholar
20.C. Zhai, P. Jansen, and E. Stoica. 1998. Threshold calibration in CLARIT adaptive filtering. In Proceeding of Seventh Text REtrieval Conference (TREC-7), pp. 149-156. National Institute of Standards and Technology, Special Publication 500-242.Google Scholar
21.C. Zhai, P. Jansen, N. Roma, E. Stoica, and D.A. Evans. 1999. Optimization in CLARIT adaptive filtering. In Proceeding of the Eighth Text REtrieval Conference (TREC-8), pp. 253-258. National Institute of Standards and Technology, Special Publication 500- 246.Google Scholar
22.Y. Zhang, and J. Callan. YFilter at TREC9. 2001. In Proceeding of Ninth Text REtrieval Conference (TREC-9). National Institute of Standards and Technology, Special Publication.Google Scholar
23.http://wol.ra.phy.cam.ac.uk/mackay/c/macopt.htmlGoogle Scholar
24.http://www.ccr.buffalo.edu/class-notes/hpc2- 00/odes/node4.htmlGoogle Scholar

Index Terms

Maximum likelihood estimation for filtering thresholds
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing
    2. Retrieval models and ranking

Recommendations

Simultaneous estimation based on empirical likelihood and general maximum likelihood estimation

One typical problem in simultaneous estimation of mean values is estimating means of normal distributions, however when normality or any other distribution is not specified, more robust estimation procedures are demanded. A new estimation procedure is ...
Read More
Mean likelihood frequency estimation

Estimation of signals with nonlinear as well as linear parameters in noise is studied. Maximum likelihood estimation has been shown to perform the best among all the methods. In such problems, joint maximum likelihood estimation of the unknown ...
Read More
Maximum likelihood estimation, analysis, and applications of exponential polynomial signals

We model complex signals by approximating the phase and the logarithm of the time-varying amplitude of the signal as a finite order polynomial. We refer to a signal that has this form as an exponential polynomial signal (EPS). We derive an iterative ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
September 2001
454 pages
ISBN:1581133316
DOI:10.1145/383952
Chairmen:
Donald H. Kraft
Louisiana State Univ.
,
W. Bruce Croft
University of Massachusetts, (For the Americas)
,
David J. Harper
The Robert Gordon University, (For Europe and Africa)
,
Justin Zobel
RMIT University, (For Asia and Australasia)
Copyright © 2001 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 September 2001
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
SIGIR '01 Paper Acceptance Rate47of201submissions,23%Overall Acceptance Rate792of3,983submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 67
  Total Citations
  View Citations
- 530
  Total Downloads
- Downloads (Last 12 months)21
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Maximum likelihood estimation for filtering thresholds

SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Simultaneous estimation based on empirical likelihood and general maximum likelihood estimation

Mean likelihood frequency estimation

Maximum likelihood estimation, analysis, and applications of exponential polynomial signals