skip to main content
review-article

PROMISE retreat report prospects and opportunities for information access evaluation

Authors Info & Claims
Published:21 December 2012Publication History
Skip Abstract Section

Abstract

The PROMISE network of excellence organized a two-days brainstorming workshop on 30th and 31st May 2012 in Padua, Italy, to discuss and envisage future directions and perspectives for the evaluation of information access and retrieval systems in multiple languages and multiple media. This document reports on the outcomes of this event and provides details about the six envisaged research lines: search applications; contextual evaluation; challenges in test collection design and exploitation; component-based evaluation; ongoing evaluation; and signal-aware evaluation. The ultimate goal of the PROMISE retreat is to stimulate and involve the research community along these research lines and to provide funding agencies with effective and scientifically sound ideas for coordinating and supporting information access research.

References

  1. M. Agosti, R. Berendsen, T. Bogers, M. Braschler, P. Buitelaar, K. Choukri, G. M. Di Nunzio, N. Ferro, P. Forner, A. Hanbury, K. Friberg Heppin, P. Hansen, A. Järvelin, B. Larsen, M. Lupu, I. Masiero, H. Müller, S. Peruzzo, V. Petras, F. Piroi, M. de Rijke, G. Santucci, G. Silvello, and E. Toms. PROMISE Retreat Report -- Prospects and Opportunities for Information Access Evaluation. PROMISE network of excellence, ISBN 978-88-6321-039-2, http://www.promise-noe.eu/promise-retreat-report-2012/, September 2012Google ScholarGoogle Scholar
  2. M. Agosti, M. Braschler, E. Di Buccio, M. Dussin, N. Ferro, G. L. Granato, I. Masiero, E. Pianta, G. Santucci, G. Silvello, and G. Tino. Deliverable D3.2 -- Specification of the evaluation infrastructure based on user requirements. PROMISE Network of Excellence, EU 7FP, Contract N. 258191. http://www.promise-noe.eu/documents/10156/fdf43394-0997-4638-9f99-38b2e9c63802, August 2011.Google ScholarGoogle Scholar
  3. M. Agosti, E. Di Buccio, N. Ferro, I. Masiero, M. Nicchio, S. Peruzzo, and G. Silvello. Deliverable D3.3 -- Prototype of the Evaluation Infrastructure. PROMISE Network of Excellence, EU 7FP, Contract N. 258191. http://www.promise-noe.eu/documents/10156/3783730a-bce3-481b-83df-48e209c6286a, September 2012.Google ScholarGoogle Scholar
  4. M. Agosti, E. Di Buccio, N. Ferro, I. Masiero, S. Peruzzo, and G. Silvello. DIRECTions: Design and Specication of an IR Evaluation Infrastructure. In Catarci et al. {24}. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Agosti, G. M. Di Nunzio, and N. Ferro. A Proposal to Extend and Enrich the Scientific Data Curation of Evaluation Campaigns. In T. Sakay, M. Sanderson, and D. K. Evans, editors, Proc. 1st International Workshop on Evaluating Information Access (EVIA 2007), pages 62--73. National Institute of Informatics, Tokyo, Japan, 2007.Google ScholarGoogle Scholar
  6. M. Agosti, G. M. Di Nunzio, and N. Ferro. The Importance of Scientific Data Curation for Evaluation Campaigns. In C. Thanos, F. Borri, and L. Candela, editors, Digital Libraries: Research and Development. First International DELOS Conference. Revised Selected Papers, pages 157--166. Lecture Notes in Computer Science (LNCS) 4877, Springer, Heidelberg, Germany, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Agosti and N. Ferro. Towards an Evaluation Infrastructure for DL Performance Evaluation. In G. Tsakonas and C. Papatheodorou, editors, Evaluation of Digital Libraries: An insight into useful applications and methods, pages 93--120. Chandos Publishing, Oxford, UK, 2009.Google ScholarGoogle Scholar
  8. M. Agosti, N. Ferro, C. Peters, M. de Rijke, and A. Smeaton, editors. Multilingual and Multimodal Information Access Evaluation. Proceedings of the International Conference of the Cross-Language Evaluation Forum (CLEF 2010). Lecture Notes in Computer Science (LNCS) 6360, Springer, Heidelberg, Germany, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Agosti, N. Ferro, and C. Thanos. DESIRE 2011Workshop on Data infrastructurEs for Supporting Information Retrieval Evaluation. SIGIR Forum, 46(1):51--55, June 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Allan, J. Aslam, L. Azzopardi, N. Belkin, P. Borlund, P. Bruza, J. Callan, C. Carman, M. Clarke, N. Craswell, W. B. Croft, J. S. Culpepper, F. Diaz, S. Dumais, N. Ferro, S. Geva, J. Gonzalo, D. Hawking, K. Järvelin, G. Jones, R. Jones, J. Kamps, N. Kando, E. Kanoulos, J. Karlgren, D. Kelly, M. Lease, J. Lin, S. Mizzaro, A. Moffat, V. Murdock, D. W. Oard, M. de Rijke, T. Sakai, M. Sanderson, F. Scholer, L. Si, J. Thom, P. Thomas, A. Trotman, A. Turpin, A. P. de Vries, W. Webber, X. Zhang, and Y. Zhang. Frontiers, Challenges, and Opportunities for Information Retrieval -- Report from SWIRL 2012, The Second Strategic Workshop on Information Retrieval in Lorne, February 2012. SIGIR Forum, 46(1):2--32, June 2012 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Angelini, N. Ferro, G. L. Granato, and G. Santucci. Deliverable D5.3 -- Collaborative User Interface Prototype with Annotation Functionalities. PROMISE Network of Excellence, EU 7FP, Contract N. 258191. http://www.promise-noe.eu/documents/10156/8c475e6c-36b5-4822-9fbc-d7d116b3a897, September 2012.Google ScholarGoogle Scholar
  12. M. Angelini, N. Ferro, G. Santucci, and G. Silvello. Visual Interactive Failure Analysis: Supporting Users in Information Retrieval Evaluation. In J. Kamps, W. Kraaij, and N. Fuhr, editors, Proc. 4th Symposium on Information Interaction in Context (IIiX 2012), pages 195--203. ACM Press, New York, USA, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. G. Armstrong, A. Moffat, W. Webber, and J. Zobel. Improvements that don't add up: ad-hoc retrieval results since 1998. In D. W.-L. Cheung, I.-Y. Song, W. W. Chu, X. Hu, and J. J. Lin, editors, Proc. 18th International Conference on Information and Knowledge Management (CIKM 2009), pages 601--610. ACM Press, New York, USA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. N. Asadi, D. Metzler, T. Elsayed, and J. Lin. Pseudo test collections for learning web search ranking functions. In W.-Y. Ma, J.-Y. Nie, R. Baeza-Yaetes, T.-S. Chua, and W. B. Croft, editors, Proc. 34th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2011), pages 1073--1082. ACM, ACM Press, New York, USA, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. L. Azzopardi, M. de Rijke, and K. Balog. Building simulated queries for known-item topics: an analysis using six european languages. In W. Kraaij, A. P. de Vries, C. L. A. Clarke, N. Fuhr, and N. Kando, editors, Proc. 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2007), pages 455--462. ACM Press, New York, USA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S.M. Beitzel, E.C. Jensen, A. Chowdhury, and D. Grossman. Using titles and category names from editor-driven taxonomies for automatic evaluation. In D. Kraft, O. Frieder, J. Hammer, S. Qureshi, and L. Seligman, editors, Proc. 12th International Conference on Information and Knowledge Management (CIKM 2003), pages 17--23. ACM Press, New York, USA, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Berendsen, M. Braschler, M. Gäde, M. Kleineberg, M. Lupu, V. Petras, and S. Reitberger. Deliverable D4.3 -- Final Report on Alternative Evaluation Methodology. PROMISE Network of Excellence, EU 7FP, Contract N. 258191. http://www.promise-noe.eu/documents/10156/0092298d-892b-45c0-a534-b9a3d0c717b1, September 2012.Google ScholarGoogle Scholar
  18. R. Berendsen, E. Tsagkias, M. de Rijke, and E. Meij. Generating pseudo test collections for learning to rank scientific articles. In Catarci et al. {24}.Google ScholarGoogle Scholar
  19. M. Braschler, K. Choukri, N. Ferro, A. Hanbury, J. Karlgren, H. Müller, V. Petras, E. Pianta, M. de Rijke, and G. Santucci. A PROMISE for Experimental Evaluation. In Agosti et al. {8}, pages 140--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Braschler, D. K. Harman, and E. Pianta, editors. CLEF 2010 Labs and Workshops, Notebook Papers. MINT srl, Trento, Italy. ISBN 978-88-904810-0-0., 2010.Google ScholarGoogle Scholar
  21. M. Braschler, S. Reitberger, M. Imhof, A. Järvelin, P. Hansen, M. Lupu, M. Gäde, R. Berendsen, and A. Garcia Seco de Herrera. Deliverable D2.3 -- Best Practices Report. PROMISE Network of Excellence, EU 7FP, Contract N. 258191. http://www.promise-noe.eu/documents/10156/086010bb-0d3f-46ef-946f-f0bbeef305e8, August 2012.Google ScholarGoogle Scholar
  22. P. Brereton, B. A. Kitchenham, D. Budgen, M. Turner, and M. Khalil. Lessons from applying the systematic literature review process within the software engineering domain. Journal of Systems and Software, 80:571--583, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. V. R. Carvalho, M. Lease, and E. Yilmaz. Crowdsourcing for search evaluation. SIGIR Forum, 44(2):17--22, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. T. Catarci, P. Forner, D. Hiemstra, A. Peñas, and G. Santucci, editors. Information Access Evaluation. Multilinguality, Multimodality, and Visual Analytics. Proceedings of the Third International Conference of the CLEF Initiative (CLEF 2012). Lecture Notes in Computer Science (LNCS) 7488, Springer, Heidelberg, Germany, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C. W. Cleverdon. Report on the testing and analysis of an investigation into the comparative efficiency of indexing systems. Technical report, Aslib Cranfield Research Project, 1962.Google ScholarGoogle Scholar
  26. C. W. Cleverdon. The Cranfield Tests on Index Languages Devices. In K. Spärck Jones and P. Willett, editors, Readings in Information Retrieval, pages 47--60. Morgan Kaufmann Publisher, Inc., San Francisco, CA, USA, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Croce, E. Di Buccio, E. Di Reto, M. Dussin, N. Ferro, G. L. Granato, P. Hansen, M. Lupu, M. Perlorca, A. Pronesti, A. Sabetta, G. Santucci, G. Silvello, G. Tino, and T. Tsikrika. Deliverable D5.2 -- User interface and Visual analytics environment requirements. PROMISE Network of Excellence, EU 7FP, Contract N. 258191. http://www.promise-noe.eu/documents/10156/21f1512a-5b47-48ae-834a-89d6441d079e, August 2011.Google ScholarGoogle Scholar
  28. E. Deelman, D. Gannon, M. Shields, and I. Taylor. Workflows and e-science: An overview of workflow system features and capabilities. Future Generation Computer Systems, 25(5):528--540, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. N. Ferro. DIRECT: the First Prototype of the PROMISE Evaluation Infrastructure for Information Retrieval Experimental Evaluation. ERCIM News, 86:54--55, July 2011.Google ScholarGoogle Scholar
  30. N. Ferro, A. Hanbury, H. Müller, and G. Santucci. Harnessing the Scientific Data Produced by the Experimental Evaluation of Search Engines and Information Access Systems. Procedia Computer Science, 4:740--749, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  31. A. Foncubierta Rodríguez and H. Müller. Ground truth generation in medical imaging,a crowdsourcing-based iterative approach. In W. T. Chu, M. Larson, W. T. Ooi, and K.-T. Chen, editors, Proc. International ACM Workshop on Crowdsourcing for Multimedia (CrowdMM 2012), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. P. Forner, J. Gonzalo, J. Kekäläinen, M. Lalmas, and M. de Rijke, editors. Multilingual and Multimodal Information Access Evaluation. Proceedings of the Second International Conference of the Cross-Language Evaluation Forum (CLEF 2011). Lecture Notes in Computer Science (LNCS) 6941, Springer, Heidelberg, Germany, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. P. Forner, J. Karlgren, and C. Womser-Hacker, editors. CLEF 2012 Labs and Workshops, Notebook Papers. MINT srl, Trento, Italy. ISBN 978-88-904810-1-7., 2012.Google ScholarGoogle Scholar
  34. A. Hanbury, H. Müller, G. Langs, M. A. Weber, B. H. Menze, and T. S. Fernandez. Bringing the algorithms to the data: cloud-based benchmarking for medical image analysis. In Catarci et al. {24}.Google ScholarGoogle Scholar
  35. Allan Hanbury and Henning Müller. Automated component-level evaluation: Present and future. In Agosti et al. {8}, pages 124--135. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. P. Hansen, G. L. Granato, and G. Santucci. Collecting and Assessing Collaborative Requirements. In C. Shah, P. Hansen, and R. Capra, editors, Proc. Workshop on Collaborative Information Seeking: Briding the Gap between Theory and Practice (CIS 2011), 2011.Google ScholarGoogle Scholar
  37. P. Hansen and A. Järvelin. Collaborative Information Retrieval in an Information-intensive Domain. Information Processing & Management, 41(5):1101--1119, September 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. D. K. Harman. Information Retrieval Evaluation. Morgan & Claypool Publishers, USA, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. D. K. Harman and E. M. Voorhees, editors. TREC. Experiment and Evaluation in Information Retrieval. MIT Press, Cambridge (MA), USA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. B. Hefley and W. Murphy, editors. Service Science, Management, and Engineering: Education for the 21st Century. Springer, Heidelberg, Germany, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. B. Huurnink, K. Hofmann, M. de Rijke, and M. Bron. Validating query simulators: An experiment using commercial searches and purchases. In Agosti et al. {8}, pages 40--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. A. Järvelin, G. Eriksson, P. Hansen, T. Tsikrika, A. Garcia Seco de Herrera, M. Lupu, M. Gäde, V. Petras, S. Rietberger, M. Braschler, and R. Berendsen. Deliverable D2.2 -- Revised Specification of Evaluation Tasks. PROMISE Network of Excellence, EU 7FP, Contract N. 258191. http://www.promise-noe.eu/documents/10156/a0d664fe-16e4-4df6-bcf9-1dc3e5e8c18e, February 2012.Google ScholarGoogle Scholar
  43. G. Juve and E. Deelman. Scientific Workflows and Clouds. ACM Crossroads, 16(3):14--18, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Y. Kano, P. Dobson, M. Nakanishi, J. Tsujii, and S. Ananiadou. Text mining meets workflow: linking u-compare with taverna. Bioinformatics, 26(19):2486--2487, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. D. Kelly. Methods for Evaluating Interactive Information Retrieval Systems with Users. Foundations and Trends in Information Retrieval (FnTIR), 3(1-2), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. S. Kumpulainen and K. Järvelin. Information Interaction in Molecular Medicine: Integrated Use of Multiple Channels. In N. J. Belkin and D. a Kelly, editors, Proc. 3rd Symposium on Information Interaction in Context (IIiX 2010), pages 95--104. ACM Press, New York, USA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. M. Lease and E. Yilmaz. Crowdsourcing for information retrieval. SIGIR Forum, 45(2):66--75, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. B. Mons, H. van Haagen, C. Chichester, P.-B. 't Hoen, J. T. den Dunnen, G. van Ommen, E. van Mulligen, B. Singh, R. Hooft, M. Roos, J. Hammond, B. Kiesel, B. Giardine, J. Velterop, P. Groth, and E. Schultes. The value of data. Nature Genetics, 43:281--283, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  49. V. Petras, P. Forner, and P. Clough, editors. CLEF 2011 Labs and Workshops, Notebook Papers. MINT srl, Trento, Italy. ISBN 978-88-904810-1-7., 2011.Google ScholarGoogle Scholar
  50. S. Reitberger, M. Imhof, M. Braschler, R. Berendsen, A. Järvelin, P. Hansen, A. Garcia Seco de Herrera, T. Tsikrika, M. Lupu, V. Petras, M. Gäde, M. Kleineberg, and K. Choukri. Deliverable D4.2 -- Tutorial on Evaluation in the Wild. PROMISE Network of Excellence, EU 7FP, Contract N. 258191. http://www.promise-noe.eu/documents/10156/3f546a0b-be7c-48df-b228-924cc5e185cb, August 2012.Google ScholarGoogle Scholar
  51. S. E. Robertson. On the history of evaluation in IR. Journal of Information Science, 34(4):439--456, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. B. R. Rowe, D. W. Wood, A. L. Link, and D. A. Simoni. Economic Impact Assessment of NIST's Text REtrieval Conference (TREC) Program. RTI Project Number 0211875, RTI International, USA. http://trec.nist.gov/pubs/2010.economic.impact.pdf, July 2010.Google ScholarGoogle Scholar
  53. M. Sanderson. Test Collection Based Evaluation of Information Retrieval Systems. Foundations and Trends in Information Retrieval (FnTIR), 4(4):247--375, 2010.Google ScholarGoogle Scholar
  54. J. Spohrer. Editorial Column-Welcome to Our Declaration of Interdependence. Service Science, 1(1):i--ii, 2009.Google ScholarGoogle Scholar
  55. C. V. Thornley, A. C. Johnson, A. F. Smeaton, and H. Lee. The Scholarly Impact of TRECVid (2003-2009). Journal of the American Society for Information Science and Technology (JASIST), 62(4):613--627, April 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. T. Tsikrika, A. Garcia Seco de Herrera, and H. Müller. Assessing the Scholarly Impact of Image-CLEF. In Forner et al. {32}, pages 95--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Z. Xie, M. O. Ward, and E. A. Rundensteiner. Visual exploration of stream pattern changes using a data-driven framework. In Proceedings of the 6th international conference on Advances in visual computing -- Volume Part II, ISVC'10, pages 522--532, Berlin, Heidelberg, 2010. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PROMISE retreat report prospects and opportunities for information access evaluation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGIR Forum
      ACM SIGIR Forum  Volume 46, Issue 2
      December 2012
      109 pages
      ISSN:0163-5840
      DOI:10.1145/2422256
      Issue’s Table of Contents

      Copyright © 2012 Copyright is held by the owner/author(s)

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 December 2012

      Check for updates

      Qualifiers

      • review-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader