Abstract
HTML tables are information rich and are used frequently in HTML documents, but they are mainly presentation-oriented and are not really suited for database applications. To wrap HTML tables, in this paper, we introduce a conceptual model for HTML tables, and based on it, we present a new approach to wrap HTML tables into XML documents. It can automatically convert basic HTML tables, nested tables, composite HTML table and the tables without marked headings, into XML documents.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Nicholas Kushmerick, D.W., Doorenbos, R.: Wrapper Induction for Information Extraction. In: Proceedings of IJCAI, pp. 729–737 (1997)
Embley, D.W., Tao, C., Liddle, S.W.: Automatically extracting ontologically specified data from HTML tables of unknown structure. In: Spaccapietra, S., March, S.T., Kambayashi, Y. (eds.) ER 2002. LNCS, vol. 2503, pp. 322–337. Springer, Heidelberg (2002)
Hammer, J., Garcia-Molina, H., Nestorov, S., Yerneni, R., Breunig, M.M., Vassalos, V.: Template-Based Wrappers in the TSIMMIS System. In: Proceedings of SIGMOD Conference, pp. 532–535 (1997)
Lim, S.J., Ng, Y.-K., Yang, X.: Integrating HTML Tables Using Semantic Hierarchies and Meta-Data Sets. In: Proceedings of IDEAS, pp. 160–169 (2002)
Gupta, S., Kaiser, G.E., Neistadt, D., Grimm, P.: DOM-based content extraction of HTML documents. In: Proceedings of WWW, pp. 207–214 (2003)
Sahuguet, A., Azavant, F.: Building Intelligent Web Applications Using Lightweight Wrappers. Data and Knowledge Engineering 36(3), 283–316 (2001)
Liu, L., Pu, C., Han, W.: XWrap: An Extensible Wrapper Construction System for Internet Information. In: Proceedings of ICDE, pp. 611–621 (2000)
Yang, Y., Luk, W.-S.: A Framework for Web Table Mining. In: Proceedings of WIDM 2002, pp. 36–42 (2002)
Li, S., Liu, M., Ling, T.-W., Peng, Z.: Automatic HTML to XML conversion. In: Li, Q., Wang, G., Feng, L. (eds.) WAIM 2004. LNCS, vol. 3129, pp. 714–719. Springer, Heidelberg (2004)
Baumgartner, R., Flesca, S., Gottlob, G.: Visual Web Information Extraction with Lixto. In: Proceedings of VLDB, pp. 119–128 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, S., Liu, M., Peng, Z. (2004). Wrapping HTML Tables into XML. In: Zhou, X., Su, S., Papazoglou, M.P., Orlowska, M.E., Jeffery, K. (eds) Web Information Systems – WISE 2004. WISE 2004. Lecture Notes in Computer Science, vol 3306. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30480-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-30480-7_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23894-2
Online ISBN: 978-3-540-30480-7
eBook Packages: Springer Book Archive