Skip to main content

Measuring and Improving Access to the Corpus

  • Chapter
Current Challenges in Patent Information Retrieval

Part of the book series: The Information Retrieval Series ((INRE,volume 29))

  • 1568 Accesses

Abstract

Retrievability is a measure of access that quantifies how easily documents can be found using a retrieval system. Such a measure is of particular interest within the patent domain, because if a retrieval system makes some patents hard to find, then patent searchers will have a difficult time retrieving these patents. This may mean that a patent searcher could miss important and relevant patents because of the retrieval system. In this chapter, we describe measures of retrievability and how they can be applied to measure the overall access to a collection given a retrieval system. We then identify three features of best-match retrieval models that are hypothesised to lead to an improvement in access to all documents in the collection: sensitivity to term frequency, length normalization and convexity. Since patent searchers tend to favour Boolean models over best-match models, hybrid retrieval models are proposed that incorporate these features while preserving the desirable aspects of the traditional Boolean model. An empirical study conducted on four large patent corpora demonstrates that these hybrid models provide better access to the corpus of patents than the traditional Boolean model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Notes

  1. 1.

    Paradoxically the output of such a model is either 1 or 0 and this contains less information than the real number yielded by best-match models.

References

  1. Arampatzis A, Kamps J, Koolen M, Nussbaum N (2007) Access to legal documents: Exact match, best match and combinations. In: TREC 2007: NIST special publication 500-274: The sixteenth text retrieval conference proceedings. NIST, Gaithersburg

    Google Scholar 

  2. Azzopardi L, Bache R (2010) On the relationship between effectiveness and accessibility. In: 33rd international ACM SIGIR conference on research and development in information retrieval, 19–23 Jul 2010, Geneva, Switzerland

    Google Scholar 

  3. Azzopardi L, Vinay V (2008) Accessibility in information retrieval. In: Advances in information retrieval ECIR 2008, Glasgow, UK, March 30–April 3. Springer, Berlin, pp 482–489

    Chapter  Google Scholar 

  4. Azzopardi L, Vinay V (2008) Document accessibility: Evaluating the access afforded to a document by the retrieval system. In: Evaluation workshop at the European conference in information retrieval, Glasgow, UK, March 30–April 3

    Google Scholar 

  5. Azzopardi L, Vinay V (2008) Evaluation methods for information access tasks. In: CIKM 2008 proceedings of the 17th ACM international conference on information and knowledge management, California, US, 26–30 October. ACM Press, New York

    Google Scholar 

  6. Azzopardi L, Vanderbauwhede W, Joho H (2010) A survey of patent analysts’ search requirements. In: Proceedings of the 33th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 2010), Geneva, Switzerland, pp. 775–776

    Google Scholar 

  7. Bache R, Azzopardi L (2010) Identifying retrievability-improving model features to enhance boolean search for patent retrieval. In: Proceedings of the 1st international workshop on the advances in patent information retrieval

    Google Scholar 

  8. Bashir S, Rauber A (2009) Improving retrievability of patents with cluster-based pseudo-relevance feedback documents selection. In: Proceedings of the 18th ACM conference on information and knowledge management (CIKM2009), Hong Kong, November 2009. ACM, New York

    Google Scholar 

  9. Bashir S, Rauber A (2010) Improving retrievability of patents in prior-art search. In: Advances in information retrieval. Lecture notes in computer science, vol 5993, pp. 457–470

    Chapter  Google Scholar 

  10. Bonino D, Ciaramella A, Corno F (2010) Review of the state-of-the-art in patent information and forthcoming evolutions in intelligent patent informatics. World Pat Inf 32(1):30–38

    Article  Google Scholar 

  11. Fang H, Tao T, Zhai C (2004) A formal study of information retrieval heuristics. In: SIGIR ’04: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 49–56

    Google Scholar 

  12. Gastwirth J (1972) The estimation of the Lorenz curve and Gini index. Rev Econ Stat 54:306–316

    Article  MathSciNet  Google Scholar 

  13. Hunt D, Nguyen L, Rodgers M (2007) Patent searching: Tools and techniques. Wiley, New York

    Google Scholar 

  14. Joho H, Azzopardi L, Vanderbauwhede W (2010). A survey of patent users: An analysis of tasks, behavior, search functionality and system requirements. In: Proceedings of the 3rd symposium on information interaction in context (IIiX 2010) 54(3):306–316

    Google Scholar 

  15. Ma H, Chandrasekar R, Quirk C, Gupta A (2009) Improving search engines using human computation games. In: CIKM ’09: Proceeding of the 18th ACM conference on information and knowledge management, pp 275–284

    Chapter  Google Scholar 

  16. Manning C, Schütze H (1999) Foundations of statistical natural language processing. MIT Press, Cambridge

    MATH  Google Scholar 

  17. Matrixware research collection. http://www.ir-facility.org/research/data/matrixware-research-collection, Last visited 2010

  18. Salton G, Fox E, Wu H (1983) Extended boolean information retrieval. Commun ACM, 1022–1036

    Google Scholar 

  19. Spärk Jones K (2004) A statistical interpretation of term specificity and its application in retrieval. J Doc 60(5):779–840

    Google Scholar 

  20. Spärk Jones K, Walker S, Robertson SE (2000) A probabilistic model of information retrieval: Development and comparative experiments (parts 1 and 2). Inf Process Manag 36(6):493–502

    Google Scholar 

  21. The lemur toolkit. http://trec.nist.gov/data.html, Last visited 2010

  22. Tseng YH, Wu YJ (2008) A study of search tactics for patentability search: A case study on patent engineers. In: PaIR ’08: Proceeding of the 1st ACM workshop on patent information retrieval. ACM, New York, pp 33–36

    Chapter  Google Scholar 

Download references

Acknowledgements

This work described in this chapter was supported and partly funded by Matrixware. I would like to thank the Information Retrieval Facility for their computation services. I would also like to thank Leif Azzopardi, Tamara Polajnar, Richard Glassey and Desmond Elliott for their helpful comments and suggestions on how to improve this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Richard Bache .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Bache, R. (2011). Measuring and Improving Access to the Corpus. In: Lupu, M., Mayer, K., Tait, J., Trippe, A. (eds) Current Challenges in Patent Information Retrieval. The Information Retrieval Series, vol 29. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19231-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19231-9_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19230-2

  • Online ISBN: 978-3-642-19231-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics