Skip to main content

Estimating the Influence of Documents in IR Systems: A Marked Indexing Approach

  • Conference paper
Book cover Computational Science and Its Applications – ICCSA 2010 (ICCSA 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6019))

Included in the following conference series:

  • 853 Accesses

Abstract

In modern information retrieval (IR) systems, scoring functions have been extensively adopted for sorting results. For a given document, the rank in sorted result lists with respect to hot searches can be considered as its influence. When a new document comes, can we use such IR systems to evaluate its influence before we insert it into the corpus? Such issue may not be solved very well by current IR systems with inverted indexes. In this paper, an influence measure based on documents’ global rank is proposed, and the inverted index structure has been extended by adding the position milestones for speeding up the ranking calculation. Moreover, a performance study using both real data and synthetic data verifies the effectiveness and the efficiency of our method.

The research of Yi Han was supported in part by by China National High-tech R&D Program (863 Program) under Grant No. 2007AA010502 and National Natural Science Foundation of China under Grant No. 60873204 and 60933005. All opinions, findings, conclusions and recommendations in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)

    Article  MATH  Google Scholar 

  2. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Computer Networks 30(1-7), 107–117 (1998)

    Google Scholar 

  3. Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford University (1998)

    Google Scholar 

  4. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. In: Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithm (SODA 1998), pp. 668–677. ACM Press, New York (1998)

    Google Scholar 

  5. Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill Book Company, New York (1984)

    Google Scholar 

  6. Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval. ACM Press / Addison-Wesley (1999)

    Google Scholar 

  7. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)

    Article  Google Scholar 

  8. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. The MIT Press and McGraw-Hill Book Company (2001)

    Google Scholar 

  9. Knuth, D.E.: The Art of Computer Programming. Sorting and Searching, vol. III. Addison-Wesley, Reading (1973)

    Google Scholar 

  10. Klimt, B., Yang, Y.: The enron corpus: A new dataset for email classification research. In: ECML, pp. 217–226 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, Y., Han, Y., Lu, T. (2010). Estimating the Influence of Documents in IR Systems: A Marked Indexing Approach. In: Taniar, D., Gervasi, O., Murgante, B., Pardede, E., Apduhan, B.O. (eds) Computational Science and Its Applications – ICCSA 2010. ICCSA 2010. Lecture Notes in Computer Science, vol 6019. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12189-0_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12189-0_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12188-3

  • Online ISBN: 978-3-642-12189-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics