skip to main content
research-article

Toward Veracity Assessment in RDF Knowledge Bases: An Exploratory Analysis

Published:20 February 2018Publication History
Skip Abstract Section

Abstract

Among different characteristics of knowledge bases, data quality is one of the most relevant to maximize the benefits of the provided information. Knowledge base quality assessment poses a number of big data challenges such as high volume, variety, velocity, and veracity. In this article, we focus on answering questions related to the assessment of the veracity of facts through Deep Fact Validation (DeFacto), a triple validation framework designed to assess facts in RDF knowledge bases. Despite current developments in the research area, the underlying framework faces many challenges. This article pinpoints and discusses these issues and conducts a thorough analysis of its pipeline, aiming at reducing the error propagation through its components. Furthermore, we discuss recent developments related to this fact validation as well as describing advantages and drawbacks of state-of-the-art models. As a result of this exploratory analysis, we give insights and directions toward a better architecture to tackle the complex task of fact-checking in knowledge bases.

References

  1. Gabor Angeli, Melvin Johnson Premkumar, and Christopher D. Manning. 2015. Leveraging linguistic structure for open domain information extraction. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, Vol. 1, 344--354.Google ScholarGoogle Scholar
  2. Hannah Bast, Björn Buchhold, and Elmar Haussmann. 2015. Relevance scores for triples from type-like relations. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 243--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Laure Berti-Équille and Javier Borge-Holthoefer. 2015. Veracity of Data: From Truth Discovery Computation Algorithms to Models of Misinformation Dynamics. Morgan 8 Claypool Publishers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Christian Bizer, Jens Lehmann, Georgi Kobilarov, Sören Auer, Christian Becker, Richard Cyganiak, and Sebastian Hellmann. 2009. DBpedia—A crystallization point for the Web of Data. J. Web Semant: Sci., Serv. Agents World Wide Web 7, 3 (2009), 154--165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Peter Buneman, Sanjeev Khanna, and Tan Wang-Chiew. 2001. Why and where: A characterization of data provenance. In Proceedings of the International Conference on Database Theory. Springer, 316--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jorge Carrillo de Albornoz, Laura Plaza, and Pablo Gervás. 2010. A hybrid approach to emotional sentence polarity and intensity classification. In Proceedings of the 14th Conference on Computational Natural Language Learning. Association for Computational Linguistics, 153--161. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Davide Ceolin, Paul Groth, Willem Robert Van Hage, Archana Nottamkandath, and Wan Fokkink. 2012. Trust evaluation through user reputation and provenance analysis. In Proceedings of the 8th International Conference on Uncertainty Reasoning for the Semantic Web, Vol. 900. CEUR-WS.org, 15--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Mach. Learn. 20, 3 (Sept. 1995), 273--297. Google ScholarGoogle ScholarCross RefCross Ref
  9. Leon Derczynski, Diana Maynard, Giuseppe Rizzo, Marieke van Erp, Genevieve Gorrell, Raphaël Troncy, Johann Petrak, and Kalina Bontcheva. 2015. Analysis of named entity recognition and linking for tweets. Inform. Proc. Manag. 51, 2 (2015), 32--49.Google ScholarGoogle ScholarCross RefCross Ref
  10. Boyang Ding, Quan Wang, and Bin Wang. 2017. Leveraging text and knowledge bases for triple scoring: An ensemble approach—The BOKCHOY triple scorer at WSDM Cup 2017. In WSDM Cup 2017 Notebook Papers, February 10, Cambridge, UK, Martin Potthast, Stefan Heindorf, and Hannah Bast (Eds.). CEUR-WS.org. Retrieved from http://www.wsdm-cup-2017.org/proceedings.html.Google ScholarGoogle Scholar
  11. Diego Esteves, Diego Moussallem, Ciro Baron Neto, Tommaso Soru, Ricardo Usbeck, Markus Ackermann, and Jens Lehmann. 2015. MEX vocabulary: A lightweight interchange format for machine learning experiments. In Proceedings of the 11th International Conference on Semantic Systems. ACM, 169--176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Diego Esteves, Rafael Peres, Jens Lehmann, and Giulio Napolitano. 2017. Named entity recognition in Twitter using images and text. Arxiv:1710.11027 (2017).Google ScholarGoogle Scholar
  13. Oren Etzioni, Michael Cafarella, Doug Downey, Stanley Kok, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. 2004. Web-scale information extraction in Knowitall: (preliminary results). In Proceedings of the 13th International Conference on World Wide Web (WWW’04). ACM, New York, 100--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Daniel Gerber, Diego Esteves, Jens Lehmann, Lorenz Bühmann, Ricardo Usbeck, Axel-Cyrille Ngonga Ngomo, and René Speck. 2015. DeFacto—Temporal and multilingual deep fact validation. J. Web Semant: Sci., Serv. Agents World Wide Web 35 (2015), 85--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Daniel Gerber and Axel-Cyrille Ngonga Ngomo. 2012. Extracting multilingual natural-language patterns for RDF predicates. In International Conference on Knowledge Engineering and Knowledge Management. Springer, 87--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Hugh Glaser, Afraz Jaffri, and Ian Millard. 2009. Managing co-reference on the semantic web. WWW2009 Workshop: Linked Data on the Web (LDOW2009). University of Southampton Institutional Repository. https://eprints.soton.ac.uk/267587/.Google ScholarGoogle Scholar
  17. Faegheh Hasibi, Darío Garigliotti, Shuo Zhang, and Krisztian Balog. 2017. Supervised ranking of triples for type-like relations—the cress triple scorer at the WSDM Cup 2017. In WSDM Cup 2017 Notebook Papers, February 10, Cambridge, UK, Martin Potthast, Stefan Heindorf, and Hannah Bast (Eds.). CEUR-WS.org. Retrieved from http://www.wsdm-cup-2017.org/proceedings.html.Google ScholarGoogle Scholar
  18. Naeemul Hassan, Afroza Sultana, You Wu, Gensheng Zhang, Chengkai Li, Jun Yang, and Cong Yu. 2014. Data in, fact out: Automated monitoring of facts by FactQatcher. Proceedings of the VLDB Endowment 7, 13 (2014), 1557--1560. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Soon Gill Hong, Sin-hee Cho, and Mun Yong Yi. 2014. Unsupervised verb inference from nouns crossing root boundary. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. Dublin City University and Association for Computational Linguistics, Dublin, 1248--1259.Google ScholarGoogle Scholar
  20. Soon Gill Hong and Mun Yong Yi. 2017. Plausibility assessment of triples with instance-based learning distantly supervised by background knowledge. Submitted to Semant. Web J. Retrieved from http://www.semantic-web-journal.net/system/files/swj1546.pdf.Google ScholarGoogle Scholar
  21. Krzysztof Janowicz. 2009. Trust and Provenance—-You Canfit Have One Without the Other. Technical Report. Institute for Geoinformatics, University of Muenster, Germany.Google ScholarGoogle Scholar
  22. Jens Lehmann, Daniel Gerber, Mohamed Morsey, and Axel-Cyrille Ngonga Ngomo. 2012. DeFacto—Deep fact validation. In Proceedings of the International Semantic Web Conference.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Furong Li, Xin Luna Dong, Anno Langen, and Yang Li. 2017. Knowledge verification for longtail verticals. PVLDB 10, 11 (2017), 1370--1381. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Xian Li, Weiyi Meng, and Clement Yu. 2011. T-verifier: Verifying truthfulness of fact statements. In Proceedings of the 2011 IEEE 27th International Conference on Data Engineering (ICDE’11). IEEE Computer Society, Washington, DC, 63--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Marco Lippi and Paolo Torroni. 2016. Argumentation mining: State of the art and emerging trends. ACM Trans. Internet Technol. (TOIT) 16, 2 (2016), 10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Catherine Macleod, Ralph Grishman, Adam Meyers, Leslie Barrett, and Ruth Reeves. 1998. NOMLEX: A lexicon of nominalizations. In Proceedings of Euralex98. 187--193.Google ScholarGoogle Scholar
  27. Mónica Marrero, Julián Urbano, Sonia Sánchez-Cuadrado, Jorge Morato, and Juan Miguel Gómez-Berbís. 2013. Named entity recognition: Fallacies, challenges and opportunities. Computer Standards 8 Interfaces 35, 5 (2013), 482--489.Google ScholarGoogle Scholar
  28. George A. Miller. 1995. WordNet: A lexical database for english. Commun. ACM 38, 11 (Nov. 1995), 39--41. 0001-0782 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Raquel Mochales Palau and Marie-Francine Moens. 2009. Argumentation mining: The detection, classification and structure of arguments in text. In Proceedings of the 12th International Conference on Artificial Intelligence and Law. ACM, 98--107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Jeff Pasternack and Dan Roth. 2011. Generalized fact-finding. In Proceedings of the 20th International Conference Companion on World Wide Web. ACM, 99--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jeff Pasternack and Dan Roth. 2011. Making better informed trust decisions with generalized fact-finding. In IJCAI. 2324--2329. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Jeff Pasternack and Dan Roth. 2013. Latent credibility analysis. In Proceedings of the 22nd International Conference on World Wide Web (WWW’13). 1009--1020. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Zhong Qian, Peifeng Li, Qiaoming Zhu, Guodong Zhou, Zhunchen Luo, and Wei Luo. Speculation and negation scope detection via convolutional neural networks. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 815--825.Google ScholarGoogle Scholar
  34. Lev Ratinov and Dan Roth. 2009. Design challenges and misconceptions in named entity recognition. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning. Association for Computational Linguistics, 147--155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Sebastian Riedel, Limin Yao, Andrew McCallum, and Benjamin M. Marlin. 2013. Relation extraction with matrix factorization and universal schemas. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 74--84.Google ScholarGoogle Scholar
  36. Gil Rocha, Henrique Lopes Cardoso, and Jorge Teixeira. 2016. ArgMine: A framework for argumentation mining. Computational Processing of the Portuguese Language-12th International Conference, PROPOR. 13--15.Google ScholarGoogle Scholar
  37. Barna Saha and Divesh Srivastava. 2014. Data quality: The other face of big data. In IEEE 30th International Conference on Data Engineering, Chicago (ICDE’14). 1294--1297.Google ScholarGoogle ScholarCross RefCross Ref
  38. B. Saha and D. Srivastava. 2014. Data quality: The other face of Big Data. In 2014 IEEE 30th International Conference on Data Engineering. 1294--1297.Google ScholarGoogle Scholar
  39. Mehdi Samadi, Partha Talukdar, Manuela Veloso, and Manuel Blum. 2016. ClaimEval: Integrated and flexible framework for claim evaluation using credibility of sources. In Proceedings of the 13th AAAI Conference on Artificial Intelligence (AAAI’16). AAAI Press, 222--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Mehdi Samadi, Manuela M. Veloso, and Manuel Blum. 2013. OpenEval: Web information query evaluation. In Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence (AAAI’13). AAAI Press, Bellevue, Washington, 1163--1169. http://dl.acm.org/citation.cfm?id=2891460.2891622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, Christopher Potts, and others. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’13), Vol. 1631. Citeseer, 1642.Google ScholarGoogle Scholar
  42. Stephen Soderland, Oren Etzioni, Tal Shaked, and D. Weld. 2004. The use of web-based statistics to validate information extraction. In AAAI-04 Workshop on Adaptive Text Extraction and Mining. 21--26.Google ScholarGoogle Scholar
  43. Andreas Vlachos and Sebastian Riedel. 2014. Fact checking: Task definition and dataset construction. In Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science. 18--22.Google ScholarGoogle ScholarCross RefCross Ref
  44. Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ricardo Pietrobon, Jens Lehmann, and Sören Auer. 2015. Quality assessment for linked data: A survey. Semantic Web 7, 1 (2015), 63--93.Google ScholarGoogle ScholarCross RefCross Ref
  45. Xiaodan Zhu, Svetlana Kiritchenko, and Saif M. Mohammad. 2014. NRC-Canada-2014: Recent improvements in the sentiment analysis of tweets. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval’14). Citeseer, 443--447.Google ScholarGoogle Scholar
  46. Valentin Zmiycharov, Dimitar Alexandrov, Preslav Nakov, Ivan Koychev, and Yasen Kiprov. 2017. Finding people’s professions and nationalities using distant supervision: The FMI@SU “goosefoot” team at the WSDM Cup 2017 triple scoring task. In WSDM Cup 2017 Notebook Papers, February 10, Cambridge, UK, Martin Potthast, Stefan Heindorf, and Hannah Bast (Eds.). CEUR-WS.org. http://www.wsdm-cup-2017.org/proceedings.html.Google ScholarGoogle Scholar

Index Terms

  1. Toward Veracity Assessment in RDF Knowledge Bases: An Exploratory Analysis

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image Journal of Data and Information Quality
            Journal of Data and Information Quality  Volume 9, Issue 3
            Special Issue on Improving the Veracity and Value of Big Data
            September 2017
            140 pages
            ISSN:1936-1955
            EISSN:1936-1963
            DOI:10.1145/3183573
            Issue’s Table of Contents

            Copyright © 2018 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 20 February 2018
            • Accepted: 1 January 2018
            • Revised: 1 December 2017
            • Received: 1 April 2017
            Published in jdiq Volume 9, Issue 3

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader