A Combined Approach for Ontology Enrichment from Textual and Open Data

Alec, Céline; Reynaud-Delaître, Chantal; Safara, Brigitte

doi:10.1007/978-3-319-65406-5_1

A Combined Approach for Ontology Enrichment from Textual and Open Data

Céline Alec⁶,
Chantal Reynaud-Delaître⁶ &
Brigitte Safara⁶

Chapter
First Online: 11 October 2017

492 Accesses

Part of the book series: Studies in Computational Intelligence ((SCI,volume 732))

Abstract

This paper proposes an approach for ontology enrichment for automatically labeling documents describing entities, with very specific concepts reflecting specific users’ needs. The peculiarity of this approach is that it addresses a triple challenge: (1) the concepts used for labeling have no direct terminology in the documents, (2) their formal definitions are not initially known, (3) the information useful to label the documents is not necessarily mentioned in them. To solve those problems, we propose to use an existing ontology of the domain of concern and to enrich it with the definitions of the concepts used for labeling. To construct these definitions, we work on a set of manually labeled documents, used as examples. The ontology is populated with information extracted from these documents, and with information coming from external resources (Linked Open Data). The definitions that we want to get can then be learned based on this populated ontology and on the set of labeled documents. Learned definitions are then added to the ontology (ontology enrichment). Hence, whenever new documents of the same domain have to be labeled, the ontology can be populated in the same way and definitions apply, allowing the new documents to be labeled. This approach, named Saupodoc, is a novel approach to ontology population and enrichment, exploiting the foundations of the Semantic Web by combining contributions of text analysis, linked open data extraction, machine learning and reasoning tools. An evaluation, on two application domains, provides quality results and demonstrates the interest of the approach.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Alec, C., Reynaud-Delaître, C., & Safar, B. (2016). A model for linked open data acquisition and SPARQL query generation. In Graph-based Modeling of Conceptual Structures. 22nd International Conference on Conceptual Structures, ICCS (pp. 237–251). Annecy, France: Springer.
Google Scholar
Bontcheva, K., Tablan, V., Maynard, D., & Cunningham, H. (2004). Evolving GATE to meet new challenges in language engineering. Natural Language Engineering, 10(3/4), 349–373.
Article Google Scholar
Cheng, X., & Roth, D. (2013). Relational inference for wikification. Empirical Methods in Natural Language Processing (EMNLP) (pp. 1787–1796), Seattle, Washington, USA.
Google Scholar
Chitsaz, M. (2013). Enriching ontologies through data. In Doctoral Consortium Co-located with International Semantic Web Conference (ISWC) (pp. 1–8), Sydney, Australia.
Google Scholar
Cimiano, P. (2006). Ontology learning and population from text: Algorithms. Evaluation and applications. Secaucus, NJ, USA: Springer New York Inc.
Google Scholar
Cimiano, P., & Völker, J. (2005). Text2Onto: A framework for ontology learning and data-driven change discovery. In Proceedings of the 10th International Conference on Natural Language Processing and Information Systems, NLDB (pp. 227–238). Alicante, Spain: Springer.
Google Scholar
Cimiano, P., Völker, J., & Studer, R. (2006). Ontologies on demand?–A description of the state-of-the-art, applications, challenges and trends for ontology learning from text. Information, Wissenschaft und Praxis, 57(6–7), 315–320.
Google Scholar
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V., Aswani, N., Roberts, I., Gorrell, G., Funk, A., Roberts, A., Damljanovic, D., Heitz, T., Greenwood, M. A., Saggion, H., Petrak, J., Li, Y., & Peters, W. (2011). Text Processing with GATE. ACM Digital Library.
Google Scholar
Esposito, F., Fanizzi, N., Iannone, L., Palmisano, I., & Semeraro, G. (2004). Knowledge-intensive induction of terminologies from metadata. In Third International Semantic Web Conference (ISWC), Hiroshima, Japan, November 7–11 (pp. 441–455).
Google Scholar
Fanizzi, N., d’Amato, C., & Esposito, F. (2008). DL-FOIL concept learning in description logics. 18th International Conference Inductive Logic Programming, (ILP) (pp. 107–121). Prague, Czech Republic.
Google Scholar
Lehmann, J. (2009). DL-Learner: Learning concepts in description logics. Journal of Machine Learning Research, 10, 2639–2642.
MathSciNet MATH Google Scholar
Lehmann, J., Auer, S., Bühmann, L., & Tramp, S. (2011). Class expression learning for ontology engineering. Journal of Web Semantics, 9, 71–81.
Article Google Scholar
Lehmann, J., & Hitzler, P. (2010). Concept learning in description logics using refinement operators. Machine Learning, 78(1–2), 203–250.
Article MathSciNet Google Scholar
Ma, Y., & Distel, F. (2013a). Concept adjustment for description logics. 7th International Conference on Knowledge Capture, K-CAP’13 (pp. 65–72). Banff, Canada: ACM.
Chapter Google Scholar
Ma, Y., & Distel, F. (2013b). Learning formal definitions for snomed CT from text. In Proceedings of Artificial Intelligence in Medicine (AIME) (pp. 73–77). Murcia, Spain: Springer.
Google Scholar
Mendes, P. N., Jakob, M., García-Silva, A., & Bizer, C. (2011). DBpedia spotlight: Shedding light on the web of documents. 7th International Conference on Semantic Systems, I-Semantics’11 (pp. 1–8). NY, USA: ACM.
Google Scholar
Petasis, G., Möller, R., & Karkaletsis, V. (2013). BOEMIE: Reasoning-based information extraction. 12th International Conference on Logic Programming and Nonmonotonic Reasoning (LPNMR) (pp. 60–75), A Corunna, Spain.
Google Scholar
Ratinov, L., Roth, D., Downey, D., & Anderson, M. (2011). Local and global algorithms for disambiguation to wikipedia. In 49th Annual Meeting of the Association for Computational Linguistics (ACL) (pp. 1375–1384).
Google Scholar
Shearer, R., Motik, B., & Horrocks, I. (2008). HermiT: A highly-efficient OWL reasoner. In Fifth Workshop on OWL (OWLED), Co-located with the 7th International Semantic Web Conference, volume 432 of CEUR Workshop Proceedings.
Google Scholar
Sirin, E., Parsia, B., Grau, B. C., Kalyanpur, A., & Katz, Y. (2007). Pellet: A practical OWL-DL reasoner. Journal of Web Semantics, 5(2), 51–53.
Article Google Scholar
Tsarkov, D., & Horrocks, I. (2006). FaCT++ description logic reasoner: System description. In Third International Joint Conference Automated Reasoning (IJCAR) (pp. 292–297), Seattle, WA, USA.
Google Scholar
Völker, J., Hitzler, P., & Cimiano, P. (2007). Acquisition of OWL DL axioms from lexical resources. In 4th European Semantic Web Conference (ESWC), pp. 670–685. Innsbruck, Austria: Springer.
Google Scholar
Yelagina, N., & Panteleyev, M. (2014). Deriving of thematic facts from unstructured texts and background knowledge. 5th International Conference Knowledge Engineering and the Semantic Web (KESW) (pp. 208–218). Kazan, Russia: Springer.
Google Scholar
Yosef, M. A., Hoffart, J., Bordino, I., Spaniol, M., & Weikum, G. (2011). AIDA: An online tool for accurate disambiguation of named entities in text and tables. In Proceedings of the 37th International Conference on Very Large Databases, (VLDB) (pp. 1450–1453).
Google Scholar

Download references

Acknowledgements

We acknowledge the Wepingo startup, which has funded this work in the settings of the Poraso project.

Author information

Authors and Affiliations

LRI, Univ. Paris-Sud, CNRS, Université Paris-Saclay, 91405, Orsay, France
Céline Alec, Chantal Reynaud-Delaître & Brigitte Safara

Authors

Céline Alec
View author publications
You can also search for this author in PubMed Google Scholar
Chantal Reynaud-Delaître
View author publications
You can also search for this author in PubMed Google Scholar
Brigitte Safara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Céline Alec .

Editor information

Editors and Affiliations

University of Bordeaux, Bordeaux, France
Bruno Pinaud
Polytech Nantes, Nantes, France
Fabrice Guillet
University of Caen Normandie, Caen Cedex 5, France
Bruno Cremilleux
University of Reims Champagne-Ardenne, Reims, France
Cyril de Runz

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Alec, C., Reynaud-Delaître, C., Safara, B. (2018). A Combined Approach for Ontology Enrichment from Textual and Open Data. In: Pinaud, B., Guillet, F., Cremilleux, B., de Runz, C. (eds) Advances in Knowledge Discovery and Management. Studies in Computational Intelligence, vol 732. Springer, Cham. https://doi.org/10.1007/978-3-319-65406-5_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-65406-5_1
Published: 11 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65405-8
Online ISBN: 978-3-319-65406-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics