Abstract
Combining information from different Web sources often results in a tedious and repetitive process, e.g. even simple information requests might require to iterate over a result list of one Web query and use each single result as input for a subsequent query. One approach for this chained queries are data-centric mashups, which allow to visually model the data flow as a graph, where the nodes represent the data source and the edges the data flow.
In this paper we combine the benefits of such an intuitive graphical modeling framework for these chained queries with the large class of Web data sources that are only accessible by filling out forms. These so-called Deep Web sites offer a wealth of structured, high-quality data but pose also several challenges. We identify and address the main challenges and propose an integrated framework for answering chained queries.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Raghavan, S., Garcia-Molina, H.: Crawling the Hidden Web. In: [24], pp. 129–138
He, B., Patel, M., Zhang, Z., Chang, K.C.C.: Accessing the Deep Web. Commun. ACM 50, 94–101 (2007)
He, H., Meng, W., Yu, C.T., Wu, Z.: WISE-Integrator: A System for Extracting and Integrating Complex Web Search Interfaces of the Deep Web. In: Böhm, K., Jensen, C.S., Haas, L.M., Kersten, M.L., Larson, P.Å., Ooi, B.C. (eds.) VLDB, pp. 1314–1317. ACM, New York (2005)
Chang, K.C.C., He, B., Zhang, Z.: Toward Large Scale Integration: Building a MetaQuerier over Databases on the Web. In: CIDR, pp. 44–55 (2005)
Davulcu, H., Freire, J., Kifer, M., Ramakrishnan, I.V.: A Layered Architecture for Querying Dynamic Web Content. In: Delis, A., Faloutsos, C., Ghandeharizadeh, S. (eds.) SIGMOD Conference, pp. 491–502. ACM Press, New York (1999)
Wang, Y., Hornung, T.: Deep Web Navigation by Example. In: Flejter, D., Grzonkowski, S., Kaczmarek, T., Kowalkiewicz, M., Nagle, T., Parkes, J. (eds.) BIS (Workshops). CEUR Workshop Proceedings, CEUR-WS.org, vol. 333, pp. 131–140 (2008)
Simon, K., Lausen, G.: ViPER: Augmenting Automatic Information Extraction with Visual Perceptions. In: Herzog, O., Schek, H.J., Fuhr, N., Chowdhury, A., Teiken, W. (eds.) CIKM, pp. 381–388. ACM, New York (2005)
Simon, K., Hornung, T., Lausen, G.: Learning Rules to Pre-process Web Data for Automatic Integration. In: Eiter, T., Franconi, E., Hodgson, R., Stephens, S. (eds.) RuleML, pp. 107–116. IEEE Computer Society, Los Alamitos (2006)
Calì, A., Martinenghi, D.: Querying Data under Access Limitations. In: ICDE, pp. 50–59. IEEE, Los Alamitos (2008)
Brickley, D., Guha, R.: RDF Vocabulary Description Language 1.0: RDF Schema (2004), http://www.w3.org/TR/rdf-schema/
Biron, P.V., Malhotra, A.: XML Schema Part 2: Datatypes Second Edition (2004), http://www.w3.org/TR/xmlschema-2/
Hassan-Montero, Y., Herrero-Solana, V.: Improving Tag-Clouds as Visual Information Retrieval Interfaces. In: InScit 2006 (2006)
Manola, F., Miller, E.: RDF Primer (2004), http://www.w3.org/TR/rdf-primer
Wang, S.Y., Guo, Y., Qasem, A., Heflin, J.: Rapid Benchmarking for Semantic Web Knowledge Base Systems. In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, pp. 758–772. Springer, Heidelberg (2005)
Maier, D., Ullman, J.D., Vardi, M.Y.: On the Foundations of the Universal Relation Model. ACM Trans. Database Syst. 9, 283–308 (1984)
Prud’hommeaux, E., Seaborne, A.: SPARQL Query Language for RDF (2007), http://www.w3.org/TR/rdf-sparql-query/
Ennals, R., Garofalakis, M.N.: MashMaker: Mashups For the Masses. In: Chan, C.Y., Ooi, B.C., Zhou, A. (eds.) SIGMOD Conference, pp. 1116–1118. ACM, New York (2007)
Laender, A.H.F., Ribeiro-Neto, B.A., da Silva, A.S., Teixeira, J.S.: A Brief Survey of Web Data Extraction Tools. SIGMOD Record 31, 84–93 (2002)
Hogue, A., Karger, D.R.: Thresher: Automating the Unwrapping of Semantic Content from the World Wide Web. In: Ellis, A., Hagino, T. (eds.) WWW, pp. 86–95. ACM, New York (2005)
Karger, D.R., Bakshi, K., Huynh, D., Quan, D., Sinha, V.: Haystack: A General-Purpose Information Management Tool for End Users Based on Semistructured Data. In: CIDR, pp. 13–26 (2005)
Baumgartner, R., Flesca, S., Gottlob, G.: Visual Web Information Extraction with Lixto. In: [24], pp. 119–128
Huynh, D., Mazzocchi, S., Karger, D.R.: Piggy Bank: Experience the Semantic Web Inside Your Web Browser. J. Web Sem. 5, 16–27 (2007)
Nash, A., Ludäscher, B.: Processing Unions of Conjunctive Queries with Negation under Limited Access Patterns. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 422–440. Springer, Heidelberg (2004)
Apers, P.M.G., Atzeni, P., Ceri, S., Paraboschi, S., Ramamohanarao, K., Snodgrass, R.T. (eds.): VLDB 2001, Proceedings of 27th International Conference on Very Large Data Bases, Roma, Italy, September 11-14, 2001. Morgan Kaufmann, San Francisco (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hornung, T., Simon, K., Lausen, G. (2009). Mashups over the Deep Web. In: Cordeiro, J., Hammoudi, S., Filipe, J. (eds) Web Information Systems and Technologies. WEBIST 2008. Lecture Notes in Business Information Processing, vol 18. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01344-7_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-01344-7_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01343-0
Online ISBN: 978-3-642-01344-7
eBook Packages: Computer ScienceComputer Science (R0)