Abstract
Historically, solutions to the TREC filtering tasks have focused exclusively on the content of documents and search topic descriptions as training data. These approaches are well-known for their ability to focus on those salient concepts in the document stream which are most useful for separating relevant documents from irrelevant ones. However, one kind of information that has not been used is the relationships among the topics themselves. In our TREC-8 routing experiments, we employed a collaborative (or social) filtering algorithm, based on latent semantic indexing which highlights common term usage patterns among groups of filtering profiles. Our hypothesis was that this would allow related topics to share common relevant documents. We found, however, that the algorithm also recommends many documents of related, yet irrelevant interest. As a result of this process, many similar search topics are “linked” together by common sets of documents recommended to them. We visualize these topic relationships using graphs where topics are nodes and edges exist where two topics share a recommended document.
Article PDF
Similar content being viewed by others
References
Ault T and Yang Y (2000) kNN at TREC-9: A failure analysis. In: Voorhees EM and Harman DK, Eds., The Ninth Text REtrieval Conference. National Institute of Standards and Technology, Gaithersburg, MD, to appear.
Berry MW(1992) Large scale singular value computations. International Journal of Supercomputer Applications, 6(1):13–49.
Berry MW, Dumais ST and O'Brien GW (1995) Using linear algebra for intelligent information retrieval. SIAM Review, 37(4):573–595.
Billsus D and Pazzani MJ (1998) Learning collaborative information filters. In: Kautz H, Ed., Proceedings from the AAAI 1998 Workshop on Recommender Systems, Madison, WI.
Breese JS, Heckerman D and Kadie C (1998) Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufman, Madison, WI.
Deerwester S, Dumais ST, Furnas GW, Landauer TK and Harshman R (1990) Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391–407.
Ding CHQ (1999) A similarity-based probability model for latent semantic indexing. In: Hearst et al., Eds., pp. 58–65.
Dumais ST (1994) Using LSI for information filtering: TREC-3 experiments. In: Harman DK, Ed., Proceedings of the Third Text REtrieval Conference (TREC-3). Gaithersburg, MD. Also Titled “Latent Semantic Indexing (LSI): TREC-3 Report”.
Goldberg D, Nichols D, Oki BMand Terry D (1992) Using collaborative filtering to weave an information tapestry. Communications of the ACM, 35(12):61–70.
Golub GH and Van Loan CF (1996) Matrix computations, 3rd edn. The Johns Hopkins University Press, Baltimore.
Gupta D, DiGiovanni M, Narita H and Goldberg K (1999) Jester 2.0: A new linear time collaborative filtering algorithm applied to jokes. In: Soboroff I, Nicholas C and Pazzani M, Eds., Proceedings of the 1999 SIGIR Workshop on Recommender Systems, Berkeley, CA.
Hearst M, Gey F and Tong R(1999), Eds. In: Proceedings of the 22nd Annual InternationalACMSIGIR Conference on Research and Development in Information Retrieval (SIGIR '99). ACM Press, Berkeley, California.
Hofmann T (1999) Probabilistic latent semantic indexing. In: Hearst et al., Eds., pp. 50–57.
Hull D (1994) Improving text retrieval for the routing problem using latent semantic indexing. In: Proceedings of the Seventeenth Annual International ACM SIGIR Conference (SIGIR '94), Dublin, Ireland.
Hull DA and Robertson S (1999) The trec-8 filtering track final report. In: Voorhees EM and Harman DK, Eds., Proceedings of the Eighth Text REtrieval Conference (TREC-8). NIST Special Publication 500-246. National Institute of Standards and Technology, Gaithersburg, MD.
Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604–632.
Kolda TG and O'Leary DP (1996) A semi-discrete matrix decomposition for latent semantic indexing in information retrieval. Technical Report UMCP-CSD CS-TR-3724, University of Maryland, College Park, Department of Computer Science, and the UM Institute for Advanced Computer Studies.
Konstan JA, Miller BN, Maltz D, Herlocker JL, Gordon LR and Riedl J (1997) GroupLens: Applying collaborative filtering to Usenet news. Communications of the ACM, 40(3):77–87.
Malone TW, Grant KR, Turbak FA, Brobst SA and Cohen MD (1987) Intelligent information-sharing systems. Communications of the ACM, 30(5):390–402.
O'Day VL and Jeffries R (1993) Orienteering in an information landscape: How information seekers get from here to there. In: Proceedings of INTERCHI'93, Amsterdam, Netherlands.
Resnick P, Iacovou N, Suchak M, Bergstrom P and Riedl J (1994) GroupLens:An open architecture for collaborative filtering of netnews. In: Proceedings of ACM 1994 Conference on Computer Supported Cooperative Work, ACM, Chapel Hill, NC.
Salton G (1971), Ed. The SMART Retrieval System: Experiments in Automatic Document Processing, Prentice Hall, Englewood Cliffs, NJ.
Schütze H, Hull DA and Pedersen JO (1995) A comparison of classifiers and document representations for the routing problem. In: Proceedings of the Eighteenth Annual International ACMSIGIR Conference (SIGIR '95), Seattle, WA, USA.
Shardanand U and Maes P (1995) Social information filtering: Algorithms for automating “word of mouth”. In: Proceedings of CHI'95–Human Factors in Computing Systems, Denver, CO, USA.
Singhal A(1997) AT & T at TREC-6. In:Voorhees EM and Harman DK, Eds., The Sixth Text REtrieval Conference, NIST Special Publication 500-240. National Institute of Standards and Technology, Gaithersburg, MD.
Singhal A, Buckley C and Mitra M(1996) Pivoted document length normalization. In: Croft WB and van Rijsbergen CJ, Eds., Proceedings of the Nineteenth Annual International ACMSIGIR Conference on Research and Development in Information Retrieval.
Singhal A, Choi J, Hindle D, Lewis DD and Pereira F (1998) AT & T at TREC-7. In: Voorhees EM and Harman DK, Eds., The Seventh Text REtrieval Conference, NIST Special Publication 500-242. National Institute of Standards and Technology, Gaithersburg, MD.
Singhal A, Mitra M and Buckley C (1997) Learning routing queries in a query zone. In: Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 97), Philadelphia, PA.
Soboroff IM (2000) Combining content-based and collaborative text filtering. PhD Thesis, University of Maryland, Baltimore County, Baltimore, MD.
Soboroff IM and Nicholas CK (1999) Combining content and collaboration in text filtering. In: Joachims T, Ed., Proceedings of the IJCAI'99 Workshop on Machine Learning in Information Filtering, Stockholm, Sweden.
Zha H and Simon HD (1999) On updating problems in latent semantic indexing. SIAM Journal on Scientific Computing, 21(2):782–791.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Soboroff, I.M., Nicholas, C.K. Related, but not Relevant: Content-Based Collaborative Filtering in TREC-8. Information Retrieval 5, 189–208 (2002). https://doi.org/10.1023/A:1015797928606
Issue Date:
DOI: https://doi.org/10.1023/A:1015797928606