Abstract
In recent years, the importance of research data and the need to archive and to share it in the scientific community have increased enormously. This introduces a whole new set of challenges for digital libraries. In the social sciences typical research data sets consist of surveys and questionnaires. In this paper we focus on the use case of social science survey question reuse and on mechanisms to support users in the query formulation for data sets. We describe and evaluate thesaurus- and co-occurrence-based approaches for query expansion to improve retrieval quality in digital libraries and research data archives. The challenge here is to translate the information need and the underlying sociological phenomena into proper queries. As we can show retrieval quality can be improved by adding related terms to the queries. In a direct comparison automatically expanded queries using extracted co-occurring terms can provide better results than queries manually reformulated by a domain expert and better results than a keyword-based BM25 baseline.
Authors are listed in alphabetical order.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Blair, D.C.: Information retrieval and the philosophy of language. Annu. Rev. Inform. Sci. Technol. 37(1), 3–50 (2003). http://dx.doi.org/10.1002/aris.1440370102
Brent, E., Slusarz, P.: Feeling the beat - intelligent coding advice from metaknowledge in qualitative research. Soc. Sci. Comput. Rev. 21(3), 281–303 (2003). http://ssc.sagepub.com/content/21/3/281
Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM Comput. Surv. 44(1), 1:1–1:50 (2012). http://doi.acm.org/10.1145/2071389.2071390
Dallmeier-Tiessen, S., Mele, S.: Integrating data in the scholarly record: community-driven digital libraries in high-energy physics. Zeitschrift für Bibliothekswesen und Bibliographie 61(4–5), 220–223 (2014). http://zs.thulb.uni-jena.de/receive/jportal_jparticle_00324882
Efron, M., Organisciak, P., Fenlon, K.: Improving retrieval of short texts through document expansion. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2012, pp. 911–920. ACM, New York (2012). http://doi.acm.org/10.1145/2348283.2348405
Friedrich, T., Kempf, A.: Making research data findable in digital libraries: a layered model for user-oriented indexing of survey data. In: 2014 IEEE/ACM Joint Conference on Digital Libraries (JCDL), pp. 53–56 (2014)
Furnas, G.W., Landauer, T.K., Gomez, L.M., Dumais, S.T.: The vocabulary problem in human-system communication. Commun. ACM 30(11), 964–971 (1987)
Hienert, D., Schaer, P., Schaible, J., Mayr, P.: A novel combined term suggestion service for domain-specific digital libraries. In: Gradmann, S., Borri, F., Meghini, C., Schuldt, H. (eds.) TPDL 2011. LNCS, vol. 6966, pp. 192–203. Springer, Heidelberg (2011)
Hong, D., Wang, Q., Zhang, D., Si, L.: Query expansion and message-passing algorithms for TREC microblog track. In: Voorhees, E.M., Buckland, L.P. (eds.) Proceedings of The Twentieth Text REtrieval Conference, TREC 2011, Gaithersburg, Maryland, November 15–18, 2011. National Institute of Standards and Technology (NIST) (2011). http://trec.nist.gov/pubs/trec20/papers/Purdue_IR.microblog.update.pdf
Hyman, L., Lamb, J., Bulmer, M.: The use of pre-existing survey questions: implications for data quality. In: Proceedings of Q2006, Cardiff, April 2006. http://eprints.port.ac.uk/4300/
Jabeur, L.B., Damak, F., Tamine, L., Cabanac, G., Pinel-Sauvagnat, K., Boughanem, M.: IRIT at TREC microblog 2013. In: Voorhees, E.M. (ed.) Proceedings of the Twenty-Second Text REtrieval Conference, TREC 2011, Gaithersburg, Maryland, November 19–22, 2013. NIST Special Publication, vol. 500–302. National Institute of Standards and Technology (NIST) (2013). http://trec.nist.gov/pubs/trec22/trec2013.html
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002). http://doi.acm.org/10.1145/582415.582418
Lüke, T., Schaer, P., Mayr, P.: A framework for specific term recommendation systems. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2013, pp. 1093–1094. ACM, New York (2013). http://doi.acm.org/10.1145/2484028.2484207
Miyanishi, T., Seki, K., Uehara, K.: Improving pseudo-relevance feedback via tweet selection. In: Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management, CIKM 2013, pp. 439–448. ACM, New York (2013). http://doi.acm.org/10.1145/2505515.2505701
Schaer, P.: Applied informetrics for digital libraries: an overview of foundations, problems and current approaches. Hist. Soc. Res. 38(3), 267–281 (2013). http://eprints.rclis.org/22630/1/HSR_38.3_Schaer_a.pdf
Xu, J., Croft, W.B.: Improving the effectiveness of information retrieval with local context analysis. ACM Trans. Inf. Syst. 18(1), 79–112 (2000). http://doi.acm.org/10.1145/333135.333138
Zapilko, B., Schaible, J., Mayr, P., Mathiak, B.: TheSoz: a SKOS representation of the thesaurus for the social sciences. Semant. Web 4(3), 257–263 (2013). http://dx.doi.org/10.3233/SW-2012-0081
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Dulisch, N., Kempf, A.O., Schaer, P. (2015). Query Expansion for Survey Question Retrieval in the Social Sciences. In: Kapidakis, S., Mazurek, C., Werla, M. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2015. Lecture Notes in Computer Science(), vol 9316. Springer, Cham. https://doi.org/10.1007/978-3-319-24592-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-24592-8_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24591-1
Online ISBN: 978-3-319-24592-8
eBook Packages: Computer ScienceComputer Science (R0)