Skip to main content

Query Expansion for Survey Question Retrieval in the Social Sciences

  • Conference paper
  • First Online:
Research and Advanced Technology for Digital Libraries (TPDL 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9316))

Included in the following conference series:

Abstract

In recent years, the importance of research data and the need to archive and to share it in the scientific community have increased enormously. This introduces a whole new set of challenges for digital libraries. In the social sciences typical research data sets consist of surveys and questionnaires. In this paper we focus on the use case of social science survey question reuse and on mechanisms to support users in the query formulation for data sets. We describe and evaluate thesaurus- and co-occurrence-based approaches for query expansion to improve retrieval quality in digital libraries and research data archives. The challenge here is to translate the information need and the underlying sociological phenomena into proper queries. As we can show retrieval quality can be improved by adding related terms to the queries. In a direct comparison automatically expanded queries using extracted co-occurring terms can provide better results than queries manually reformulated by a domain expert and better results than a keyword-based BM25 baseline.

Authors are listed in alphabetical order.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://zacat.gesis.org/webview/.

  2. 2.

    http://www.gesis.org/en/services/data-collection/zisehes/.

References

  1. Blair, D.C.: Information retrieval and the philosophy of language. Annu. Rev. Inform. Sci. Technol. 37(1), 3–50 (2003). http://dx.doi.org/10.1002/aris.1440370102

    Article  Google Scholar 

  2. Brent, E., Slusarz, P.: Feeling the beat - intelligent coding advice from metaknowledge in qualitative research. Soc. Sci. Comput. Rev. 21(3), 281–303 (2003). http://ssc.sagepub.com/content/21/3/281

    Article  Google Scholar 

  3. Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM Comput. Surv. 44(1), 1:1–1:50 (2012). http://doi.acm.org/10.1145/2071389.2071390

    Article  MATH  Google Scholar 

  4. Dallmeier-Tiessen, S., Mele, S.: Integrating data in the scholarly record: community-driven digital libraries in high-energy physics. Zeitschrift für Bibliothekswesen und Bibliographie 61(4–5), 220–223 (2014). http://zs.thulb.uni-jena.de/receive/jportal_jparticle_00324882

    Article  Google Scholar 

  5. Efron, M., Organisciak, P., Fenlon, K.: Improving retrieval of short texts through document expansion. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2012, pp. 911–920. ACM, New York (2012). http://doi.acm.org/10.1145/2348283.2348405

  6. Friedrich, T., Kempf, A.: Making research data findable in digital libraries: a layered model for user-oriented indexing of survey data. In: 2014 IEEE/ACM Joint Conference on Digital Libraries (JCDL), pp. 53–56 (2014)

    Google Scholar 

  7. Furnas, G.W., Landauer, T.K., Gomez, L.M., Dumais, S.T.: The vocabulary problem in human-system communication. Commun. ACM 30(11), 964–971 (1987)

    Article  Google Scholar 

  8. Hienert, D., Schaer, P., Schaible, J., Mayr, P.: A novel combined term suggestion service for domain-specific digital libraries. In: Gradmann, S., Borri, F., Meghini, C., Schuldt, H. (eds.) TPDL 2011. LNCS, vol. 6966, pp. 192–203. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  9. Hong, D., Wang, Q., Zhang, D., Si, L.: Query expansion and message-passing algorithms for TREC microblog track. In: Voorhees, E.M., Buckland, L.P. (eds.) Proceedings of The Twentieth Text REtrieval Conference, TREC 2011, Gaithersburg, Maryland, November 15–18, 2011. National Institute of Standards and Technology (NIST) (2011). http://trec.nist.gov/pubs/trec20/papers/Purdue_IR.microblog.update.pdf

  10. Hyman, L., Lamb, J., Bulmer, M.: The use of pre-existing survey questions: implications for data quality. In: Proceedings of Q2006, Cardiff, April 2006. http://eprints.port.ac.uk/4300/

  11. Jabeur, L.B., Damak, F., Tamine, L., Cabanac, G., Pinel-Sauvagnat, K., Boughanem, M.: IRIT at TREC microblog 2013. In: Voorhees, E.M. (ed.) Proceedings of the Twenty-Second Text REtrieval Conference, TREC 2011, Gaithersburg, Maryland, November 19–22, 2013. NIST Special Publication, vol. 500–302. National Institute of Standards and Technology (NIST) (2013). http://trec.nist.gov/pubs/trec22/trec2013.html

  12. Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002). http://doi.acm.org/10.1145/582415.582418

    Article  Google Scholar 

  13. Lüke, T., Schaer, P., Mayr, P.: A framework for specific term recommendation systems. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2013, pp. 1093–1094. ACM, New York (2013). http://doi.acm.org/10.1145/2484028.2484207

  14. Miyanishi, T., Seki, K., Uehara, K.: Improving pseudo-relevance feedback via tweet selection. In: Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management, CIKM 2013, pp. 439–448. ACM, New York (2013). http://doi.acm.org/10.1145/2505515.2505701

  15. Schaer, P.: Applied informetrics for digital libraries: an overview of foundations, problems and current approaches. Hist. Soc. Res. 38(3), 267–281 (2013). http://eprints.rclis.org/22630/1/HSR_38.3_Schaer_a.pdf

    Google Scholar 

  16. Xu, J., Croft, W.B.: Improving the effectiveness of information retrieval with local context analysis. ACM Trans. Inf. Syst. 18(1), 79–112 (2000). http://doi.acm.org/10.1145/333135.333138

    Article  Google Scholar 

  17. Zapilko, B., Schaible, J., Mayr, P., Mathiak, B.: TheSoz: a SKOS representation of the thesaurus for the social sciences. Semant. Web 4(3), 257–263 (2013). http://dx.doi.org/10.3233/SW-2012-0081

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nadine Dulisch .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Dulisch, N., Kempf, A.O., Schaer, P. (2015). Query Expansion for Survey Question Retrieval in the Social Sciences. In: Kapidakis, S., Mazurek, C., Werla, M. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2015. Lecture Notes in Computer Science(), vol 9316. Springer, Cham. https://doi.org/10.1007/978-3-319-24592-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24592-8_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24591-1

  • Online ISBN: 978-3-319-24592-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics