Inferring Types on Large Datasets Applying Ontology Class Hierarchy Classifiers: The DBpedia Case

Rico, Mariano; Santana-Pérez, Idafen; Pozo-Jiménez, Pedro; Gómez-Pérez, Asunción

doi:10.1007/978-3-030-03667-6_21

Mariano Rico¹⁷,
Idafen Santana-Pérez¹⁷,
Pedro Pozo-Jiménez¹⁷ &
…
Asunción Gómez-Pérez¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11313))

Included in the following conference series:

European Knowledge Acquisition Workshop

1035 Accesses

Abstract

Adding type information to resources belonging to large knowledge graphs is a challenging task, specially when considering those that are generated collaboratively, such as DBpedia, which usually contain errors and noise produced during the transformation process from different data sources. It is important to assign the correct type(s) to resources in order to efficiently exploit the information provided by the dataset. In this work we explore how machine learning classification models can be applied to solve this issue, relying on the information defined by the ontology class hierarchy. We have applied our approaches to DBpedia and compared to the state of the art, using a per-level analysis. We also define metrics to measure the quality of the results. Our results show that this approach is able to assign 56% more new types with higher precision and recall than the current DBpedia state of the art.

This work was partially funded by grant CAS18/00333 (Castillejo), and projects RTC-2016-4952-7 (esTA) and TIN2016-78011-C4-4-R (Datos 4.0), from the Spanish State Investigation Agency of the MINECO and FEDER Funds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Faralli, S., Ponzetto, S.P.: DWS at the 2016 open knowledge extraction challenge: a hearst-like pattern-based approach to hypernym extraction and class induction. In: Sack, H., Dietze, S., Tordai, A., Lange, C. (eds.) SemWebEval 2016. CCIS, vol. 641, pp. 48–60. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46565-4_4
Chapter Google Scholar
Gangemi, A., Nuzzolese, A.G., Presutti, V., Draicchio, F., Musetti, A., Ciancarini, P.: Automatic typing of DBpedia entities. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012. LNCS, vol. 7649, pp. 65–81. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35176-1_5
Chapter Google Scholar
Kliegr, T., Zamazal, O.: LHD 2.0: a text mining approach to typing entities in knowledge graphs. Web Semant. 39, 47–61 (2016)
Article Google Scholar
Luaces, O., Díez, J.: Binary relevance efficacy for multilabel classification. Prog. Artif. Intell. 1(4), 303–313 (2012)
Article Google Scholar
Marini, F., Magrì, A., Bucci, R.: Multilayer feed-forward artificial neural networks for class modeling. Chemom. Intell. Lab. Syst. 88, 118–124 (2007)
Article Google Scholar
Melo, A., et al.: Type prediction in noisy RDF knowledge bases using hierarchical multilabel classification with graph and latent features. IJAIT 26(02) (2017)
Article Google Scholar
Mihindukulasooriya, N., Rico, M., García-Castro, R., Gómez-Pérez, A.: An analysis of the quality issues of the properties available in the Spanish DBpedia. In: Puerta, J.M., Gámez, J.A., Dorronsoro, B., Barrenechea, E., Troncoso, A., Baruque, B., Galar, M. (eds.) CAEPIA 2015. LNCS (LNAI), vol. 9422, pp. 198–209. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24598-0_18
Chapter Google Scholar
Mihindukulasooriya, N., Rico, M., et al.: Repairing hidden links in linked data: enhancing the quality of RDF knowledge graphs. In: K-CAP Proceedings (2017)
Google Scholar
Murphy, K.P.: Naive Bayes classifiers. University of British Columbia (2006)
Google Scholar
Paulheim, H., Bizer, C.: Type inference on noisy RDF data. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 510–525. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41335-3_32
Chapter Google Scholar
Paulheim, H., Bizer, C.: Improving the quality of linked data using statistical distributions. IJSWIS 10(2), 63–86 (2014)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Google Scholar
Rico, M., Mihindukulasooriya, N., et al.: Predicting incorrect mappings: a data-driven approach applied to DBpedia. In: Proceedings of SAC, pp. 323–330. ACM (2018)
Google Scholar
Ristoski, P., Paulheim, H.: Feature selection in hierarchical feature spaces. In: Džeroski, S., Panov, P., Kocev, D., Todorovski, L. (eds.) DS 2014. LNCS (LNAI), vol. 8777, pp. 288–300. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11812-3_25
Chapter Google Scholar
Tsoumakas, G., Vlahavas, I.: Random k-labelsets: an ensemble method for multilabel classification. In: Kok, J.N., Koronacki, J., Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 406–417. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74958-5_38
Chapter Google Scholar
Wu, X., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)
Article Google Scholar
Zhang, M.-L., Zhou, Z.-H.: Multilabel neural networks with applications to functional genomics and text categorization. IEEE TKDE 18(10), 1338–1351 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Ontology Engineering Group, Universidad Politécnica de Madrid, Madrid, Spain
Mariano Rico, Idafen Santana-Pérez, Pedro Pozo-Jiménez & Asunción Gómez-Pérez

Authors

Mariano Rico
View author publications
You can also search for this author in PubMed Google Scholar
Idafen Santana-Pérez
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Pozo-Jiménez
View author publications
You can also search for this author in PubMed Google Scholar
Asunción Gómez-Pérez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mariano Rico .

Editor information

Editors and Affiliations

Université Côte d’Azur, CNRS, Inria, I3S, Sophia Antipolis, France
Catherine Faron Zucker
Fondazione Bruno Kessler, Trento, Italy
Chiara Ghidini
University of Lorraine, CNRS, Inria, LORIA, Nancy, France
Amedeo Napoli
University of Lorraine, CNRS, Inria, LORIA, Nancy, France
Yannick Toussaint

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rico, M., Santana-Pérez, I., Pozo-Jiménez, P., Gómez-Pérez, A. (2018). Inferring Types on Large Datasets Applying Ontology Class Hierarchy Classifiers: The DBpedia Case. In: Faron Zucker, C., Ghidini, C., Napoli, A., Toussaint, Y. (eds) Knowledge Engineering and Knowledge Management. EKAW 2018. Lecture Notes in Computer Science(), vol 11313. Springer, Cham. https://doi.org/10.1007/978-3-030-03667-6_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-03667-6_21
Published: 31 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03666-9
Online ISBN: 978-3-030-03667-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics