Skip to main content

Performance Comparison of Ad-Hoc Retrieval Models over Full-Text vs. Titles of Documents

  • Conference paper
  • First Online:
Maturity and Innovation in Digital Libraries (ICADL 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11279))

Included in the following conference series:

  • 1231 Accesses

Abstract

While there are many studies on information retrieval models using full-text, there are presently no comparison studies of full-text retrieval vs. retrieval only over the titles of documents. On the one hand, the full-text of documents like scientific papers is not always available due to, e.g., copyright policies of academic publishers. On the other hand, conducting a search based on titles alone has strong limitations. Titles are short and therefore may not contain enough information to yield satisfactory search results. In this paper, we compare different retrieval models regarding their search performance on the full-text vs. only titles of documents. We use different datasets, including the three digital library datasets: EconBiz, IREON, and PubMed. The results show that it is possible to build effective title-based retrieval models that provide competitive results comparable to full-text retrieval. The difference between the average evaluation results of the best title-based retrieval models is only 3% less than those of the best full-text-based retrieval models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://bitbucket.org/a_saleh/icadl2018.

References

  1. Galke, L., Mai, F., Schelten, A., Brunsch, D., Scherp, A.: Using titles vs. full-text as source for automated semantic document annotation. In: International Conference on Knowledge Capture (K-CAP), May 2017

    Google Scholar 

  2. Nishioka, C., Scherp, A.: Profiling vs. time vs. content: what does matter for top-k publication recommendation based on twitter profiles? In: 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL), pp. 171–180. IEEE (2016)

    Google Scholar 

  3. Croft, W.B., Metzler, D., Strohman, T.: Search Engines: Information Retrieval in Practice, vol. 283. Addison-Wesley, Reading (2010)

    Google Scholar 

  4. Christopher, D.M., Prabhakar, R., Hinrich, S.: Introduction to Information Retrieval, vol. 151, p. 177 (2008)

    Google Scholar 

  5. Barker, F.H., Veal, D.C., Wyatt, B.K.: Comparative efficiency of searching titles, abstracts, and index terms in a free-text data base. J. Doc. 28(1), 22–36 (1972)

    Article  Google Scholar 

  6. Lin, J.: Is searching full text more effective than searching abstracts? BMC Bioinform. 10(1), 46 (2009)

    Article  Google Scholar 

  7. Hemminger, B.M., Saelim, B., Sullivan, P.F., Vision, T.J.: Comparison of full-text searching to metadata searching for genes in two biomedical literature cohorts. J. Am. Soc. Inf. Sci. Technol. 58(14), 2341–2352 (2007)

    Article  Google Scholar 

  8. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)

    Article  Google Scholar 

  9. Goossen, F., IJntema, W., Frasincar, F., Hogenboom, F., Kaymak, U.: News personalization using the CF-IDF semantic recommender. In: The International Conference on Web Intelligence, Mining and Semantics. ACM (2011)

    Google Scholar 

  10. Chen, R.C., Spina, D., Croft, W.B., Sanderson, M., Scholer, F.: Harnessing semantics for answer sentence retrieval. In: Workshop on Exploiting Semantic Annotations in Information Retrieval, pp. 21–27. ACM (2015)

    Google Scholar 

  11. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 26, pp. 3111–3119. Curran Associates, Inc. (2013)

    Google Scholar 

  12. Liu, T.Y.: Learning to rank for information retrieval. Found. Trends Inf. Retr. 3(3), 225–331 (2009)

    Article  Google Scholar 

  13. Burges, C., et al.: Learning to rank using gradient descent. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 89–96. ACM (2005)

    Google Scholar 

  14. Wu, Q., Burges, C.J., Svore, K.M., Gao, J.: Adapting boosting for information retrieval measures. Inf. Retr. 13(3), 254–270 (2010)

    Article  Google Scholar 

  15. Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4, 933–969 (2003)

    MathSciNet  MATH  Google Scholar 

  16. Xu, J., Li, H.: AdaRank: a boosting algorithm for information retrieval. In: The Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 391–398. ACM (2007)

    Google Scholar 

  17. Metzler, D., Croft, W.B.: Linear feature-based models for information retrieval. Inf. Retr. 10(3), 257–274 (2007)

    Article  Google Scholar 

  18. Cao, Z., Qin, T., Liu, T.Y., Tsai, M.F., Li, H.: Learning to rank: from pairwise approach to listwise approach. In: The 24th International Conference on Machine Learning, pp. 129–136. ACM (2007)

    Google Scholar 

  19. Zhang, Y., et al.: Neural information retrieval: a literature review. arXiv preprint arXiv:1611.06792 (2016)

  20. Huang, P.S., He, X., Gao, J., Deng, L., Acero, A., Heck, L.: Learning deep structured semantic models for web search using clickthrough data. In: International Conference on Information and Knowledge Management (2013)

    Google Scholar 

  21. Shen, Y., He, X., Gao, J., Deng, L., Mesnil, G.: Learning semantic representations using convolutional neural networks for web search. In: The International Conference on World Wide Web, pp. 373–374. ACM (2014)

    Google Scholar 

  22. Shen, Y., He, X., Gao, J., Deng, L., Mesnil, G.: A latent semantic model with convolutional-pooling structure for information retrieval. In: The International Conference on Information and Knowledge Management. ACM (2014)

    Google Scholar 

  23. Metzler, D., Kanungo, T.: Machine learned sentence selection strategies for query-biased summarization. In: SIGIR Learning to Rank Workshop (2008)

    Google Scholar 

  24. Qin, T., Liu, T.Y.: Introducing LETOR 4.0 Datasets. CoRR (2013)

    Google Scholar 

  25. Qin, T., Liu, T.Y., Xu, J., Li, H.: How to make LETOR more useful and reliable. In: SIGIR Workshop on Learning to Rank for Information Retrieval (2008)

    Google Scholar 

  26. Minka, T., Robertson, S.: Selection bias in the LETOR datasets. In: SIGIR Workshop on Learning to Rank for Information Retrieval, pp. 48–51. Citeseer (2008)

    Google Scholar 

  27. Fortmann-Roe, S.: Understanding the bias-variance tradeoff (2012)

    Google Scholar 

  28. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  29. Rehurek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: The LREC Workshop on New Challenges for NLP Frameworks (2010)

    Google Scholar 

  30. Hall, M.A.: Correlation-based feature selection of discrete and numeric class machine learning (2000)

    Google Scholar 

  31. Cohen, D., Ai, Q., Croft, W.B.: Adaptability of neural networks on varying granularity IR tasks. arXiv preprint arXiv:1606.07565 (2016)

Download references

Acknowledgement

This work was supported by the EU’s Horizon 2020 programme under grant agreement H2020-693092 MOVING.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmed Saleh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Saleh, A., Beck, T., Galke, L., Scherp, A. (2018). Performance Comparison of Ad-Hoc Retrieval Models over Full-Text vs. Titles of Documents. In: Dobreva, M., Hinze, A., Žumer, M. (eds) Maturity and Innovation in Digital Libraries. ICADL 2018. Lecture Notes in Computer Science(), vol 11279. Springer, Cham. https://doi.org/10.1007/978-3-030-04257-8_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-04257-8_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-04256-1

  • Online ISBN: 978-3-030-04257-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics