Skip to main content

Inferring Types on Large Datasets Applying Ontology Class Hierarchy Classifiers: The DBpedia Case

  • Conference paper
  • First Online:
Book cover Knowledge Engineering and Knowledge Management (EKAW 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11313))

Included in the following conference series:

  • 1035 Accesses

Abstract

Adding type information to resources belonging to large knowledge graphs is a challenging task, specially when considering those that are generated collaboratively, such as DBpedia, which usually contain errors and noise produced during the transformation process from different data sources. It is important to assign the correct type(s) to resources in order to efficiently exploit the information provided by the dataset. In this work we explore how machine learning classification models can be applied to solve this issue, relying on the information defined by the ontology class hierarchy. We have applied our approaches to DBpedia and compared to the state of the art, using a per-level analysis. We also define metrics to measure the quality of the results. Our results show that this approach is able to assign 56% more new types with higher precision and recall than the current DBpedia state of the art.

This work was partially funded by grant CAS18/00333 (Castillejo), and projects RTC-2016-4952-7 (esTA) and TIN2016-78011-C4-4-R (Datos 4.0), from the Spanish State Investigation Agency of the MINECO and FEDER Funds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/anuzzolese/oke-challenge.

  2. 2.

    https://github.com/anuzzolese/oke-challenge-2016.

  3. 3.

    http://ner.vse.cz/datasets/linkedhypernyms/evaluation/.

References

  1. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  2. Faralli, S., Ponzetto, S.P.: DWS at the 2016 open knowledge extraction challenge: a hearst-like pattern-based approach to hypernym extraction and class induction. In: Sack, H., Dietze, S., Tordai, A., Lange, C. (eds.) SemWebEval 2016. CCIS, vol. 641, pp. 48–60. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46565-4_4

    Chapter  Google Scholar 

  3. Gangemi, A., Nuzzolese, A.G., Presutti, V., Draicchio, F., Musetti, A., Ciancarini, P.: Automatic typing of DBpedia entities. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012. LNCS, vol. 7649, pp. 65–81. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35176-1_5

    Chapter  Google Scholar 

  4. Kliegr, T., Zamazal, O.: LHD 2.0: a text mining approach to typing entities in knowledge graphs. Web Semant. 39, 47–61 (2016)

    Article  Google Scholar 

  5. Luaces, O., Díez, J.: Binary relevance efficacy for multilabel classification. Prog. Artif. Intell. 1(4), 303–313 (2012)

    Article  Google Scholar 

  6. Marini, F., Magrì, A., Bucci, R.: Multilayer feed-forward artificial neural networks for class modeling. Chemom. Intell. Lab. Syst. 88, 118–124 (2007)

    Article  Google Scholar 

  7. Melo, A., et al.: Type prediction in noisy RDF knowledge bases using hierarchical multilabel classification with graph and latent features. IJAIT 26(02) (2017)

    Article  Google Scholar 

  8. Mihindukulasooriya, N., Rico, M., García-Castro, R., Gómez-Pérez, A.: An analysis of the quality issues of the properties available in the Spanish DBpedia. In: Puerta, J.M., Gámez, J.A., Dorronsoro, B., Barrenechea, E., Troncoso, A., Baruque, B., Galar, M. (eds.) CAEPIA 2015. LNCS (LNAI), vol. 9422, pp. 198–209. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24598-0_18

    Chapter  Google Scholar 

  9. Mihindukulasooriya, N., Rico, M., et al.: Repairing hidden links in linked data: enhancing the quality of RDF knowledge graphs. In: K-CAP Proceedings (2017)

    Google Scholar 

  10. Murphy, K.P.: Naive Bayes classifiers. University of British Columbia (2006)

    Google Scholar 

  11. Paulheim, H., Bizer, C.: Type inference on noisy RDF data. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 510–525. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41335-3_32

    Chapter  Google Scholar 

  12. Paulheim, H., Bizer, C.: Improving the quality of linked data using statistical distributions. IJSWIS 10(2), 63–86 (2014)

    Google Scholar 

  13. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)

    Google Scholar 

  14. Rico, M., Mihindukulasooriya, N., et al.: Predicting incorrect mappings: a data-driven approach applied to DBpedia. In: Proceedings of SAC, pp. 323–330. ACM (2018)

    Google Scholar 

  15. Ristoski, P., Paulheim, H.: Feature selection in hierarchical feature spaces. In: Džeroski, S., Panov, P., Kocev, D., Todorovski, L. (eds.) DS 2014. LNCS (LNAI), vol. 8777, pp. 288–300. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11812-3_25

    Chapter  Google Scholar 

  16. Tsoumakas, G., Vlahavas, I.: Random k-labelsets: an ensemble method for multilabel classification. In: Kok, J.N., Koronacki, J., Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 406–417. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74958-5_38

    Chapter  Google Scholar 

  17. Wu, X., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)

    Article  Google Scholar 

  18. Zhang, M.-L., Zhou, Z.-H.: Multilabel neural networks with applications to functional genomics and text categorization. IEEE TKDE 18(10), 1338–1351 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mariano Rico .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rico, M., Santana-Pérez, I., Pozo-Jiménez, P., Gómez-Pérez, A. (2018). Inferring Types on Large Datasets Applying Ontology Class Hierarchy Classifiers: The DBpedia Case. In: Faron Zucker, C., Ghidini, C., Napoli, A., Toussaint, Y. (eds) Knowledge Engineering and Knowledge Management. EKAW 2018. Lecture Notes in Computer Science(), vol 11313. Springer, Cham. https://doi.org/10.1007/978-3-030-03667-6_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-03667-6_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-03666-9

  • Online ISBN: 978-3-030-03667-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics