skip to main content
10.1145/383952.384012acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Maximum likelihood estimation for filtering thresholds

Published:01 September 2001Publication History

ABSTRACT

Information filtering systems based on statistical retrieval models usually compute a numeric score indicating how well each document matches each profile. Documents with scores above profile-specificdissemination thresholdsare delivered.

An optimal dissemination threshold is one that maximizes a given utility function based on the distributions of the scores of relevant and non-relevant documents. The parameters of the distribution can be estimated using relevance information, but relevance information obtained while filtering isbiased. This paper presents a new method of adjusting dissemination thresholds that explicitly models and compensates for this bias. The new algorithm, which is based on the Maximum Likelihood principle, jointly estimates the parameters of the density distributions for relevant and non-relevant documents and the ratio of the relevant document in the corpus. Experiments with TREC-8 and TREC-9 Filtering Track data demonstrate the effectiveness of the algorithm.

References

  1. 1.J. Allan. 1996. Incremental relevance feedback for information filtering. In Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 270-278. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.A. Arampatzis, J. Beney, C.H.A. Koster, and T.P. van der Weide. 2001. KUN on the TREC-9 Filtering Track: Incrementality, decay, and threshold optimization for adaptive filtering systems. In Proceeding of Ninth Text REtrieval Conference (TREC-9). National Institute of Standards and Technology, Special Publication.Google ScholarGoogle Scholar
  3. 3.J. Broglio, J.P. Callan, W.B. Croft, and D.W. Nachbar. 1995. Document retrieval and routing using the INQUERY system. In Proceeding of Third Text REtrieval Conference (TREC-3), pp. 29-38. National Institute of Standards and Technology, Special Publication 500-225.Google ScholarGoogle Scholar
  4. 4.J. Callan. 1996. Document filtering with inference networks. In Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 262- 269. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.J. Callan. 1998. Learning while filtering. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 224-231. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.D A. Hull, and S. E. Robertson. 1999. The TREC-8 Filtering Track final report. In Proceeding of the Eighth Text REtrieval Conference (TREC-8), pp. 35- 56. National Institute of Standards and Technology, Special Publication 500-246.Google ScholarGoogle Scholar
  7. 7.Y.H. Kim, S.Y. Hahn, and B.T. Zhang. 2000. Text filtering by boosting Naive Bayes classifiers. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 168-175. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.R.D. Lyer, D.D. Lewis, R.E. Schapire, Y. Singer, and A. Singhal. 2000. Boosting for document routing. In Proceedings of the Ninth International Conference on Information Knowledge Management (CIKM 2000), pp. 70-77. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9.M. F. Porter. 1980. An algorithm for suffix stripping. Program, 14 (3), pp. 130-137.Google ScholarGoogle ScholarCross RefCross Ref
  10. 10.W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery. 1992. Numerical Recipes in C: The Art of Scientific Computing, pp. 420-425. Cambridge University Press. Google ScholarGoogle Scholar
  11. 11.W. Hersh, C. Buckley, T. J. Leone and D. Hickam 1994. OHSUMED: An interactive retrieval evaluation and new large test collection for research. In Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 192-201. ACM Press Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12.R. Manmatha, T. Rath, and F. Feng. 2001. Modeling score distributions for combining the outputs of search engines. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval , New Orleans, LA. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13.J. J. Rocchio. 1971. Relevance feedback in information retrieval. In The SMART Retrieval System - Experiments In Automatic Document Processing, pp. 313-323. Prentice Hall.Google ScholarGoogle Scholar
  14. 14.S. E. Robertson, and D. A. Hull. 2001. Guidelines for the TREC-9 Filtering Track. Proceeding of Ninth Text REtrieval Conference (TREC-9). National Institute of Standards and Technology, Special Publication.Google ScholarGoogle Scholar
  15. 15.S.E. Robertson, and S. Walker. 2001. Microsoft Cambridge at TREC-9: Filtering track. In Proceeding of Ninth Text REtrieval Conference (TREC-9). National Institute of Standards and Technology, Special Publication.Google ScholarGoogle Scholar
  16. 16.S.E. Robertson. In press. Threshold setting in adaptive filtering. Journal of Documentation.Google ScholarGoogle Scholar
  17. 17.S. E. Robertson, S. Walker, M. M. Beaulieu, and M. Gatford, A. Payne. 1995. Okapi at TREC-4. In Proceeding of Fourth Text REtrieval Conference (TREC-4). National Institute of Standards and Technology, Special Publication 500-236.Google ScholarGoogle Scholar
  18. 18.R.E. Schapire, Y. Singer, and A. Singhal. 1998. Boosting and Rocchio applied to text filtering. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 215-223. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19.T.Ault, and Y. Yang. 2001. kNN at TREC-9: A failure analysis. In Proceeding of Ninth Text REtrieval Conference (TREC-9). National Institute of Standards and Technology, Special Publication.Google ScholarGoogle Scholar
  20. 20.C. Zhai, P. Jansen, and E. Stoica. 1998. Threshold calibration in CLARIT adaptive filtering. In Proceeding of Seventh Text REtrieval Conference (TREC-7), pp. 149-156. National Institute of Standards and Technology, Special Publication 500-242.Google ScholarGoogle Scholar
  21. 21.C. Zhai, P. Jansen, N. Roma, E. Stoica, and D.A. Evans. 1999. Optimization in CLARIT adaptive filtering. In Proceeding of the Eighth Text REtrieval Conference (TREC-8), pp. 253-258. National Institute of Standards and Technology, Special Publication 500- 246.Google ScholarGoogle Scholar
  22. 22.Y. Zhang, and J. Callan. YFilter at TREC9. 2001. In Proceeding of Ninth Text REtrieval Conference (TREC-9). National Institute of Standards and Technology, Special Publication.Google ScholarGoogle Scholar
  23. 23.http://wol.ra.phy.cam.ac.uk/mackay/c/macopt.htmlGoogle ScholarGoogle Scholar
  24. 24.http://www.ccr.buffalo.edu/class-notes/hpc2- 00/odes/node4.htmlGoogle ScholarGoogle Scholar

Index Terms

  1. Maximum likelihood estimation for filtering thresholds

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
        September 2001
        454 pages
        ISBN:1581133316
        DOI:10.1145/383952

        Copyright © 2001 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 September 2001

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        SIGIR '01 Paper Acceptance Rate47of201submissions,23%Overall Acceptance Rate792of3,983submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader