Abstract
Adding type information to resources belonging to large knowledge graphs is a challenging task, specially when considering those that are generated collaboratively, such as DBpedia, which usually contain errors and noise produced during the transformation process from different data sources. It is important to assign the correct type(s) to resources in order to efficiently exploit the information provided by the dataset. In this work we explore how machine learning classification models can be applied to solve this issue, relying on the information defined by the ontology class hierarchy. We have applied our approaches to DBpedia and compared to the state of the art, using a per-level analysis. We also define metrics to measure the quality of the results. Our results show that this approach is able to assign 56% more new types with higher precision and recall than the current DBpedia state of the art.
This work was partially funded by grant CAS18/00333 (Castillejo), and projects RTC-2016-4952-7 (esTA) and TIN2016-78011-C4-4-R (Datos 4.0), from the Spanish State Investigation Agency of the MINECO and FEDER Funds.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Faralli, S., Ponzetto, S.P.: DWS at the 2016 open knowledge extraction challenge: a hearst-like pattern-based approach to hypernym extraction and class induction. In: Sack, H., Dietze, S., Tordai, A., Lange, C. (eds.) SemWebEval 2016. CCIS, vol. 641, pp. 48–60. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46565-4_4
Gangemi, A., Nuzzolese, A.G., Presutti, V., Draicchio, F., Musetti, A., Ciancarini, P.: Automatic typing of DBpedia entities. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012. LNCS, vol. 7649, pp. 65–81. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35176-1_5
Kliegr, T., Zamazal, O.: LHD 2.0: a text mining approach to typing entities in knowledge graphs. Web Semant. 39, 47–61 (2016)
Luaces, O., Díez, J.: Binary relevance efficacy for multilabel classification. Prog. Artif. Intell. 1(4), 303–313 (2012)
Marini, F., Magrì, A., Bucci, R.: Multilayer feed-forward artificial neural networks for class modeling. Chemom. Intell. Lab. Syst. 88, 118–124 (2007)
Melo, A., et al.: Type prediction in noisy RDF knowledge bases using hierarchical multilabel classification with graph and latent features. IJAIT 26(02) (2017)
Mihindukulasooriya, N., Rico, M., García-Castro, R., Gómez-Pérez, A.: An analysis of the quality issues of the properties available in the Spanish DBpedia. In: Puerta, J.M., Gámez, J.A., Dorronsoro, B., Barrenechea, E., Troncoso, A., Baruque, B., Galar, M. (eds.) CAEPIA 2015. LNCS (LNAI), vol. 9422, pp. 198–209. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24598-0_18
Mihindukulasooriya, N., Rico, M., et al.: Repairing hidden links in linked data: enhancing the quality of RDF knowledge graphs. In: K-CAP Proceedings (2017)
Murphy, K.P.: Naive Bayes classifiers. University of British Columbia (2006)
Paulheim, H., Bizer, C.: Type inference on noisy RDF data. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 510–525. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41335-3_32
Paulheim, H., Bizer, C.: Improving the quality of linked data using statistical distributions. IJSWIS 10(2), 63–86 (2014)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Rico, M., Mihindukulasooriya, N., et al.: Predicting incorrect mappings: a data-driven approach applied to DBpedia. In: Proceedings of SAC, pp. 323–330. ACM (2018)
Ristoski, P., Paulheim, H.: Feature selection in hierarchical feature spaces. In: Džeroski, S., Panov, P., Kocev, D., Todorovski, L. (eds.) DS 2014. LNCS (LNAI), vol. 8777, pp. 288–300. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11812-3_25
Tsoumakas, G., Vlahavas, I.: Random k-labelsets: an ensemble method for multilabel classification. In: Kok, J.N., Koronacki, J., Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 406–417. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74958-5_38
Wu, X., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)
Zhang, M.-L., Zhou, Z.-H.: Multilabel neural networks with applications to functional genomics and text categorization. IEEE TKDE 18(10), 1338–1351 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Rico, M., Santana-Pérez, I., Pozo-Jiménez, P., Gómez-Pérez, A. (2018). Inferring Types on Large Datasets Applying Ontology Class Hierarchy Classifiers: The DBpedia Case. In: Faron Zucker, C., Ghidini, C., Napoli, A., Toussaint, Y. (eds) Knowledge Engineering and Knowledge Management. EKAW 2018. Lecture Notes in Computer Science(), vol 11313. Springer, Cham. https://doi.org/10.1007/978-3-030-03667-6_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-03667-6_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03666-9
Online ISBN: 978-3-030-03667-6
eBook Packages: Computer ScienceComputer Science (R0)