Skip to main content

Retrieving Documents Related to Database Queries

  • Conference paper
SOFSEM 2015: Theory and Practice of Computer Science (SOFSEM 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8939))

  • 1255 Accesses

Abstract

Databases and documents are commonly isolated from each other, controlled by Database Management Systems (DBMS) and Information Retrieval Systems (IRS), respectively. However, both systems are likely to store data about the same entities, a strong argument in favor of their integration. We propose a DBMS-IRS integration approach that uses terms in DBMS queries as keywords to IRS searches, retrieving documents strongly related to the queries. The IRS keywords are built “expanding” an initial set of user-provided keywords, with top-ranked terms found in a query result: the terms are ranked based on a measure of term diffusion over the query result. Our experiments show the effectiveness of the approach in two different domains, in comparison to other DBMS-IRS integration methods, as well as to other term-ranking methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Weikum, G.: DB & IR: both sides now. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data - SIGMOD 2007, pp. 25–30. ACM, New York (2007)

    Chapter  Google Scholar 

  2. Chaudhuri, S., Ramakrishnan, R., Weikum, G.: Integrating DB and IR technologies: what is the sound of one hand clapping? In: Proceedings of the Second Biennial Conference on Innovative Data Systems Research - CIDR 2005, pp. 1–12. VLDB Foundation (2005)

    Google Scholar 

  3. Halevy, A., Franklin, M., Maier, D.: Principles of dataspace systems. In: Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems - PODS 2006, pp. 1–9. ACM, New York (2006)

    Chapter  Google Scholar 

  4. Mirza, H.T., Chen, L., Chen, G.: Practicability of dataspace systems. International Journal of Digital Content Technology and its Applications 4(3), 233–243 (2010)

    Article  Google Scholar 

  5. Cafarella, M.J., Christopher, R., Suciu, D., Etzioni, O., Banko, M.: Structured querying of Web text: a technical challenge. In: Proceedings of the 3rd Biennial Conference on Innovative Data Systems Research - CIDR 2007, pp. 225–234. VLDB Foundation (2007)

    Google Scholar 

  6. Jain, A., Doan, A., Gravano, L.: SQL queries over unstructured text databases. In: Proceedings of the 23rd IEEE International Conference on Data Engineering - ICDE 2007, pp. 1255–1257. IEEE Computer Society, Washington-DC (2007)

    Chapter  Google Scholar 

  7. Yu, J.X., Qin, L., Chang, L.: Keyword search in relational databases: a survey. IEEE Data Eng. Bull. 33(1), 67–78 (2010)

    Google Scholar 

  8. Luk, R.W.P., Leong, H.V., Dillon, T.S., Chan, A.T.S., Croft, W.B.: A survey in indexing and searching XML documents. J. Assoc. Inf. Sci. Technol. 53(6), 415–437 (2002)

    Article  Google Scholar 

  9. Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM Comp. Surv. 44(1), 1–50 (2012)

    Article  Google Scholar 

  10. Liu, J., Dong, X., Halevy, A.: Answering structured queries on unstructured data. In: Proceedings of the 9th International Workshop on the Web and Databases - WebDB 2006, Chicago, USA, pp. 25–30 (2006)

    Google Scholar 

  11. Roy, P., Mohania, M., Bamba, B., Raman, S.: Towards automatic association of relevant unstructured content with structured query results. In: Proceedings of the fourteenth ACM Conference on Information and Knowledge Management - CIKM 2005, pp. 405–412. ACM, New York (2005)

    Google Scholar 

  12. Amati, G., Rijsbergen, C.J.V.: Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. on Inf. Syst. 20(4), 357–389 (2002)

    Article  Google Scholar 

  13. Carpineto, C., Mori, R., Romano, G., Bigi, B.: An information-theoretic approach to automatic query expansion. ACM Trans. on Inf. Syst. 19(1), 1–27 (2001)

    Article  Google Scholar 

  14. Lavrenko, V., Croft, W.B.: Relevance-based language models. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR 2001, pp. 120–127. ACM, New York (2001)

    Chapter  Google Scholar 

  15. Roy, P., Mohania, M.: SCORE: symbiotic context oriented information retrieval. In: Pro-ceedings of the Joint 9th Asia-Pacific Web and 8th International Conference on Web-Age Information Management Conference on Advances in Data and Web Management - AP-Web/WAIM 2007, Huang Shan, China, pp. 30–38 (2007)

    Google Scholar 

  16. Dong, X.L., Halevy, A.: A platform for personal information management and integration. In: Proceedings of the Second Biennial Conference on Innovative Data Systems Research - CIDR 2005, pp. 119–130. VLDB Foundation (2005)

    Google Scholar 

  17. Lavrenko, V., Allan, J.: Real-time query expansion in relevance models. Internal Report 473, Center for Intelligent Information Retrieval - CIIR, University of Massachusetts (2006)

    Google Scholar 

  18. Fox, C.: Lexical analysis and stoplists. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval: Data Structures and Algorithms, pp. 102–130. Prentice Hall, USA (1992)

    Google Scholar 

  19. Mitra, M., Singhal, A., Buckley, C.: Improving automatic query expansion. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1998, pp. 206–214. ACM, New York (1998)

    Google Scholar 

  20. Rocchio, J.J.: Relevance feedback in information retrieval. In: Salton, G. (ed.) SMART Retrieval System - Experiments in Automatic Document Processing, pp. 313–323. Prentice Hall, USA (1971)

    Google Scholar 

  21. Sanderson, M.: Test collection based evaluation of information retrieval systems. Found. Trend. Inf. Ret. 4(4), 247–375 (2010)

    Article  MATH  Google Scholar 

  22. Lalmas, M., Tombros, A.: INEX 2002 - 2006: Understanding XML retrieval evaluation. In: Thanos, C., Borri, F., Candela, L. (eds.) Digital Libraries: Research and Development. LNCS, vol. 4877, pp. 187–196. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  23. Wang, Q., Ramírez, G., Marx, M., Theobald, M., Kamps, J.: Overview of the INEX 2011 Data-Centric track. In: Geva, S., Kamps, J., Schenkel, R. (eds.) INEX 2011. LNCS, vol. 7424, pp. 118–137. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  24. Bellot, P., et al.: Overview of INEX 2013. In: Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B. (eds.) CLEF 2013. LNCS, vol. 8138, pp. 269–281. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  25. Voorhees, E.M.: The TREC Robust Retrieval Track. SIGIR Forum 39(1), 11–20 (2005)

    Article  Google Scholar 

  26. Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. Artif. Intell. 194, 28–61 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  27. Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D.: DBpedia – a large-scale, multilingual knowledge base extracted from Wikipedia. Semant. Web J. (in press)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Catão, V.S., Sampaio, M.C., Schiel, U. (2015). Retrieving Documents Related to Database Queries. In: Italiano, G.F., Margaria-Steffen, T., Pokorný, J., Quisquater, JJ., Wattenhofer, R. (eds) SOFSEM 2015: Theory and Practice of Computer Science. SOFSEM 2015. Lecture Notes in Computer Science, vol 8939. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46078-8_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-46078-8_41

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-46077-1

  • Online ISBN: 978-3-662-46078-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics