Skip to main content
Log in

Towards Ontology Generation from Tables

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

At the heart of today's information-explosion problems are issues involving semantics, mutual understanding, concept matching, and interoperability. Ontologies and the Semantic Web are offered as a potential solution, but creating ontologies for real-world knowledge is nontrivial. If we could automate the process, we could significantly improve our chances of making the Semantic Web a reality. While understanding natural language is difficult, tables and other structured information make it easier to interpret new items and relations. In this paper we introduce an approach to generating ontologies based on table analysis. We thus call our approach TANGO (Table ANalysis for Generating Ontologies). Based on conceptual modeling extraction techniques, TANGO attempts to (i) understand a table's structure and conceptual content; (ii) discover the constraints that hold between concepts extracted from the table; (iii) match the recognized concepts with ones from a more general specification of related concepts; and (iv) merge the resulting structure with other similar knowledge representations. TANGO is thus a formalized method of processing the format and content of tables that can serve to incrementally build a relevant reusable conceptual ontology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. R. Baumgartner, S. Flesca, and G. Gottlob, “Visual web information extraction with Lixto,” in Proceedings of the 27th International Conference on Very Large Data Bases (VLDB'01). Rome, Italy, 2001, pp. 119–128.

  2. S. Bergamaschi, S. Castano, and M. Vincini, “Semantic integration of semistructured and structured data sources,” SIGMOD Record 28(1), 1999, 54–59.

    Google Scholar 

  3. T. Berners-Lee, J. Hendler, and O. Lassila, “The semantic Web,” Scientific American 36(25) 2001.

  4. J. Biskup and D. Embley, “Extracting information from heterogeneous information sources using ontologically specified target views,” Information Systems 28(3), 2003, 169–212.

    Article  Google Scholar 

  5. A. Burgun and O. Bodenreider, “Comparing terms, concepts, and semantic classes in WordNet and the Unified Medical Language System,” in WordNet and Other Lexical Resources: Applications, Extensions, and Customizations; An NAACL-01 (North American Association for Computational Linguistics) Workshop. Pittsburgh, Pennsylvania, 2001, pp. 77–82.

  6. A. Cali, D. Calvanese, G. D. Giacomo, and M. Lenzerini, “On the expressive power of data integration systems,” in Proceedings of 21st International Conference on Conceptual Modeling (ER2002). Tampere, Finland, 2002, pp. 338–350.

  7. S. Castano, V. D. Antonellis, M. Fugini, and B. Pernici, “Conceptual Schema Analysis: Techniques and Applications,” ACM Transactions on Database Systems 23(3), 1998, 286–333.

    Article  Google Scholar 

  8. T. Chartrand, “Ontology-based extraction of RDF data from the world wide web”. Master's thesis, Brigham Young University, Provo, Utah 2003.

  9. R. Chiang, T. Barron, and V. Storey, “Reverse engineering of relational databases: Extraction of an eer model from a relational database,” Data & Knowledge Engineering 12(2), 1994, 107–142.

    Article  Google Scholar 

  10. S. Clyde, D. Embley, and S. Woodfield, “Improving the quality of systems and domain analysis through object class congruency,” in Proceedings of the International IEEE Symposium on Engineering of Computer Based Systems (ECBS'96), Friedrichshafen, Germany, 1996, pp. 44–51.

  11. V. Crescenzi, G. Mecca, and P. Merialdo, “RoadRunner: Towards automatic data extraction from large web sites,” in Proceedings of the 27th International Conference on Very Large Data Bases (VLDB'01). Rome, Italy, 2001, pp. 109–118.

  12. dlbeck.com, 2003, “dlbeck.com,” http://www.dlbeck.com/population.htm.

  13. A. Doan, P. Domingos, and A. Halevy, “Reconciling schemas of disparate data sources: A machine-learning approach,” in Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data (SIGMOD 2001). Santa Barbara, California, 2001, pp. 509–520.

  14. D. Embley, “Programming with data frames for everyday data items,” in Proceedings of the 1980 National Computer Conference. Anaheim, California, 1980, pp. 301–305.

  15. D. Embley, Object Database Development: Concepts and Principles, Addison-Wesley: Reading, Massachusetts, 1998.

    Google Scholar 

  16. D. Embley, D. Campbell, Y. Jiang, S. Liddle, D. Lonsdale, Y.-K. Ng, and R. Smith, “Conceptual-model-based data extraction from multiple-record web pages,” Data & Knowledge Engineering 31(3), 1999, 227–251.

    Article  Google Scholar 

  17. D. Embley, D. Hurst, D. Lopresti, and G. Nagy,“Table processing Paradigms: A research survey,” International Journal on Document Analysis and Recognition, 2004a. (Submitted).

  18. D. Embley, D. Jackman, and L. Xu, “Multifaceted exploitation of metadata for attribute match discovery in information integration,” in Proceedings of the International Workshop on Information Integration on the Web (WIIW'01). Rio de Janeiro, Brazil, 2001, pp. 110–117.

  19. D. Embley, B. Kurtz, and S. Woodfield, Object-Oriented Systems Analysis: A Model-Driven Approach, Prentice Hall: Englewood Cliffs, New Jersey, 1992.

    Google Scholar 

  20. D. Embley, C. Tao, and S. Liddle, “Automating the extraction of data from tables with unknown structure,” Data & Knowledge Engineering. (to appear) currently at http://www.deg.byu.edu/papers/dke2003etl.pdf, 2004b.

  21. D. Embley and M. Xu, “Relational database reverse engineering: A model-centric, transformational, interactive approach formalized in model theory,” in DEXA'97 Workshop Proceedings, Toulouse, France, 1997, pp. 372–377.

  22. C. Fellbaum, WordNet: An Electronic Lexical Database, MIT Press: Cambridge, Massachussets, 1998.

    Google Scholar 

  23. T. R. Gruber, “Towards principles for the design of ontologies used for knowledge sharing,” in N. Guarino and R. Poli (eds.), Formal Ontology in Conceptual Analysis and Knowledge Representation. Deventer, The Netherlands, 1993.

  24. N. Guarino, “Formal ontologies and information systems,” in N. Guarino (ed.), Proceedings of the First International Conference on Formal Ontology in Information Systems (FOIS98). Trento, Italy, 1998, pp. 3–15.

  25. J.-L. Hainaut, “Database reverse engineering: Models, techniques and strategies,” Proc. of the 10th International Conference on Entity-Relationship Approach (ER'91). San Mateo, California, USA, 1991, pp. 643–670.

  26. Y. Kalfoglou and M. Schorlemmer, “Ontology mapping: The state of the art,” The Knowledge Engineering Review 18(1), 2003, 1–31.

    Article  Google Scholar 

  27. M. Kantola, H. Mannila, K.-J. Räihä, and H. Siirtola, “Discovering functional and inclusion dependencies in relational databases,” International Journal of Intelligent Systems 7, 1992, 591–607.

    Google Scholar 

  28. J. Lemke, “Multiplying meaning: Visual and verbal semiotics in scientific text,” in J. Martin and R. Veel (eds.), Reading Science: Critical and Functional Perspectives on Discourses of Science. Routledge, 1998, pp. 87–113.

  29. W.-S. Li and C. Clifton, “Semantic integration in heterogeneous databases using neural networks”. in Proceedings of the 20th Very Large Data Base Conference. Santiago, Chile, 1994.

  30. D. Lopresti and G. Nagy, “A tabular survey of table processing,” in A. Chhabra and D. Dori (eds.), Graphics Recognition—Recent Advances, Lecture Notes in Computer Science, LNCS 1941. Springer Verlag, 2000, pp. 93–120.

  31. J. Madhavan, P. Bernstein, and E. Rahm, “Generic schema matching with cupid,” in Proceedings of the 27th International Conference on Very Large Data Bases (VLDB'01). Rome, Italy, 2001, pp. 49–58.

  32. D. Maier, The Theory of Relational Databases, Computer Science Press, Inc: Rockville, Maryland, 1983.

  33. D. Maier and L. Delcambre, “Superimposed information for the internet,” in S. Cluet and T. Milo (eds.), Proceedings of the ACM SIGMOD Workshop on the Web and Databases (WebDB'99). Philadelphia, Pennsylvania, 1999.

  34. F. D. Marchi, S. Lopes, J.-M. Petit, and F. Toumani, “Analysis of existing databases and the logical level: The DBA companion project,” SIGMOD Record 32(1), 2003, 47–52.

    Google Scholar 

  35. V. Markowitz and J. A. Makowsky, “Identifying extended entity-relationship object structures in relational schemas,” IEEE Transactions on Software Engineering 16(8), 1990, 777–790.

    Google Scholar 

  36. D. McGuinness, R. Fikes, J. Rice, and S. Wilde, “An environment for merging and testing large ontologies,” in Proceedings of the Seventh International Conference on Principles of Knowledge Representation and Reasoning. Breckenridge, Colorado, 2000, pp. 483–493.

  37. T. Milo and S. Zohar, “Using schema matching to simplify heterogeneous data translation,” in Proceedings of the 24th International Conference on Very Large Data Bases (VLDB-98), 1998, pp. 122–133.

  38. R. Mizoguchi and M. Ikeda, “Towards ontology engineering,” in proceedings of the Joint 1997 Pacific Asian Conference on Expert Systems / Singapore International Conference on Intelligent Systems. Singapore, 1997, pp. 259–266.

  39. MoA, 2004, “MoA—An OWL ontology merging and alignment tool,” http://mknows.etri.re.kr/moa/index.html.

  40. MostSpokenLanguages, “The 30 most spoken languages of the world,” http://www.krysstal.com/spoken.html, 2003.

  41. S. Nestorov, S. Abiteboul, and R. Motwani, “Extracting schema from semistructured data,” in Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data (SIGMOD'98), Seattle, Washington, 1998, pp. 295–306.

  42. E. Rahm and P. Bernstein, “A survey of approaches to automatic schema matching,” The VLDB Journal 10, 2001, 334–350.

    Google Scholar 

  43. M. Schoop, A. Becks, C. Quix, T. Burwick, C. Engels, and M. Jarke, “Enhancing decision and negotiation support in enterprise networks through semantic web technologies,” in XML Technologien fur das Semantic WebXSW 2002, Proceedings zum Workshop, 2002, pp. 161–167.

  44. P. Spyns, R. Meersman, and M. Jarrar, “Data modeling versus ontology engineering,” SIGMOD Record 31(4), 2002, 12–17.

    Google Scholar 

  45. Y. Tijerino, D. Embley, D. Lonsdale, and G. Nagy, “Ontology generation from tables,” in Proceedings of the 4th International Conference on Web Information Systems Engineering. Rome, Italy, 2003, 242–249.

  46. TopoZone2002: 2002, ‘TopoZone,’ http://www.topozone.com.

  47. K. Wang and H. Liu, “Schema discovery for semistructured data,” in Proceedings of the Third International Conference on Knowledge Discovery and Data Mining. Newport Beach, California, 1997, pp. 271–274.

  48. WorldAtlas2003, ‘WorldAtlas.Com,’ 2003, http://www.worldatlas.com/geoquiz/thelist.htm.

  49. WorldFactbook2003, “The World Factbook—2003”, 2003. http://www.cia.gov/cia/publications/factbook.

  50. L. Xu and D. Embley, “Using domain ontologies to discover direct and indirect matches for schema elements,” in Proceedings of the Workshop on Semantic Integration (WSI'03). Sanibel Island, Florida, 2003, pp. 105–110.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuri A. Tijerino.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tijerino, Y.A., Embley, D.W., Lonsdale, D.W. et al. Towards Ontology Generation from Tables. World Wide Web 8, 261–285 (2005). https://doi.org/10.1007/s11280-005-0360-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-005-0360-8

Keywords

Navigation