skip to main content
research-article
Public Access

Computational Fact Checking through Query Perturbations

Published:09 January 2017Publication History
Skip Abstract Section

Abstract

Our media is saturated with claims of “facts” made from data. Database research has in the past focused on how to answer queries, but has not devoted much attention to discerning more subtle qualities of the resulting claims, for example, is a claim “cherry-picking”? This article proposes a framework that models claims based on structured data as parameterized queries. Intuitively, with its choice of the parameter setting, a claim presents a particular (and potentially biased) view of the underlying data. A key insight is that we can learn a lot about a claim by “perturbing” its parameters and seeing how its conclusion changes. For example, a claim is not robust if small perturbations to its parameters can change its conclusions significantly. This framework allows us to formulate practical fact-checking tasks—reverse-engineering vague claims, and countering questionable claims—as computational problems. Along with the modeling framework, we develop an algorithmic framework that enables efficient instantiations of “meta” algorithms by supplying appropriate algorithmic building blocks. We present real-world examples and experiments that demonstrate the power of our model, efficiency of our algorithms, and usefulness of their results.

References

  1. Charu C. Aggarwal (Ed.). 2009. Managing and Mining Uncertain Data. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Raju Balakrishnan and Subbarao Kambhampati. 2011. SourceRank: Relevance and trust assessment for deep web sources based on inter-source agreement. In Proceedings of the 2011 International Conference on World Wide Web. 227--236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Philip A. Bernstein and Laura M. Haas. 2008. Information integration in the enterprise. Commun. ACM 51, 9 (2008), 72--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Stephan Börzsönyi, Donald Kossmann, and Konrad Stocker. 2001. The skyline operator. In Proceedings of the 2001 International Conference on Data Engineering. 421--430. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Christian Buchta. 1989. On the average number of maxima in a set of vectors. Inform. Process. Lett. 33, 2 (1989), 63--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Surajit Chaudhuri. 1990. Generalization and a framework for query modification. In Proceedings of the 6th International Conference on Data Engineering, 1990. IEEE, 138--145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bernard Chazelle. 1988. A functional approach to data structures and its use in multidimensional searching. SIAM J. Comput. 17, 3 (1988), 427--462. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Wesley W. Chu, Qiming Chen, and Rei-Chi Lee. 1991. Cooperative Query Answering via Type Abstraction Hierarchy. Springer. Google ScholarGoogle ScholarCross RefCross Ref
  9. Sarah Cohen, James T. Hamilton, and Fred Turner. 2011a. Computational journalism. Commun. ACM 54, 10 (2011), 66--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Sarah Cohen, Chengkai Li, Jun Yang, and Cong Yu. 2011b. Computational journalism: A call to arms to database researchers. In Proceedings of the 2011 Conference on Innovative Data Systems Research. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Harish D., Pooja N. Darera, and Jayant R. Haritsa. 2008. Identifying robust plans through plan diagram reduction. In Proceedings of the 2008 International Conference on Very Large Data Bases. 1124--1140.Google ScholarGoogle Scholar
  12. Nilesh N. Dalvi, Christopher Ré, and Dan Suciu. 2009. Probabilistic databases: Diamonds in the dirt. Commun. ACM 52, 7 (2009), 86--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Anish Das Sarma, Aditya G. Parameswaran, Hector Garcia-Molina, and Jennifer Widom. 2010. Synthesizing view definitions from data. In Proceedings of the 2010 International Conference on Database Theory. 89--103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Mark De Berg, Marc Van Kreveld, Mark Overmars, and Otfried Cheong Schwarzkopf. 2000. Computational Geometry. Springer. Google ScholarGoogle ScholarCross RefCross Ref
  15. AnHai Doan, Alon Halevy, and Zachary Ives. 2012. Principles of Data Integration (1st ed.). Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. 2009. Integrating conflicting data: The role of source dependence. Proc. VLDB Endow. 2, 1 (2009), 550--561. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ronald Fagin, Amnon Lotem, and Moni Naor. 2003. Optimal aggregation algorithms for middleware. J. Comput. System Sci. 66, 4 (2003), 614--656. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sumit Ganguly. 1998. Design and analysis of parametric query optimization algorithms. In Proceedings of the 1998 International Conference on Very Large Data Bases. 228--238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jim Giles. 2012. Truth goggles. The New Scientist 2882 (Sept. 2012), 44--47. Google ScholarGoogle ScholarCross RefCross Ref
  20. Jim Gray, Adam Bosworth, Andrew Layman, and Hamid Pirahesh. 1996. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-total. In Proceedings of the 1996 International Conference on Data Engineering. 152--159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Dov Harel and Robert E. Tarjan. 1984. Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13, 2 (1984), 338--355. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Zhian He and Eric Lo. 2012. Answering why-not questions on top-k queries. In Proceedings of the 2012 International Conference on Data Engineering. 750--761. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Soon-Young Huh, Kae-Hyun Moon, and Hee-Seok Lee. 2000. A data abstraction approach for query relaxation. Inf. Softw. Technol. 42, 6 (2000), 407--418. Google ScholarGoogle ScholarCross RefCross Ref
  24. Arvind Hulgeri and S. Sudarshan. 2003. AniPQO: Almost non-intrusive parametric query optimization for nonlinear cost functions. In Proceedings of the 2003 International Conference on Very Large Data Bases. 766--777. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Yannis E. Ioannidis, Raymond T. Ng, Kyuseok Shim, and Timos K. Sellis. 1992. Parametric query optimization. In Proceedings of the 1992 International Conference on Very Large Data Bases. 103--114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Ravi Jampani, Fei Xu, Mingxi Wu, Luis Leopoldo Perez, Chris Jermaine, and Peter J. Haas. 2011. The Monte Carlo database system: Stochastic analysis close to the data. ACM Trans. Database Syst. 36, 3 (2011), 18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Christian S. Jensen and Richard Snodgrass. 1994. Temporal specialization and generalization. IEEE Trans. Knowl. Data Eng. 6, 6 (1994), 954--974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jia-Ling Koh, Kuang-Ting Chiang, and I.-Chih Chiu. 2013. The strategies for supporting query specialization and query generalization in social tagging systems. In Database Systems for Advanced Applications. Springer, 164--178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Hsiang-Tsung Kung, Fabrizio Luccio, and Franco P. Preparata. 1975. On finding the maxima of a set of vectors. J. ACM 22, 4 (1975), 469--476. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Xian Li, Weiyi Meng, and Clement T. Yu. 2011. T.-verifier: Verifying truthfulness of fact statements. In Proceedings of the 2011 International Conference on Data Engineering. 63--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Yunyao Li, Ishan Chaudhuri, Huahai Yang, Satinder Singh, and H. V. Jagadish. 2007. DaNaLIX: A domain-adaptive natural language interface for querying XML. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. 1165--1168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Yunyao Li, Huahai Yang, and H. V. Jagadish. 2006. Constructing a generic natural language interface for an XML database. In Proceedings of the 2006 International Conference on Extending Database Technology. 737--754. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Xika Lin, Abhishek Mukherji, Elke A. Rundensteiner, Carolina Ruiz, and Matthew O. Ward. 2013. PARAS: A parameter space framework for online association mining. Proc. VLDB Endow. 6, 3 (2013), 193--204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Kurt Mehlhorn and Stefan Näher. 1990. Dynamic fractional cascading. Algorithmica 5, 1--4 (1990), 215--241.Google ScholarGoogle ScholarCross RefCross Ref
  35. Kyriakos Mouratidis and HweeHwa Pang. 2012. Computing immutable regions for subspace top-k queries. Proc.VLDB Endow. 6, 2 (2012), 73--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Ana-Maria Popescu, Oren Etzioni, and Henry A. Kautz. 2003. Towards a theory of natural language interfaces to databases. In Proceedings of the 2003 International Conference on Intelligent User Interfaces. 149--157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Alexander J. Quinn and Benjamin B. Bederson. 2011. Human computation: A survey and taxonomy of a growing field. In Proceedings of the 2011 International Conference on Human Factors in Computing Systems. 1403--1412. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Sudeepa Roy and Dan Suciu. 2014. A formal approach to finding explanations for database queries. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 1579--1590. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Mohamed A. Soliman, Ihab F. Ilyas, Davide Martinenghi, and Marco Tagliasacchi. 2011. Ranking with uncertain scoring functions: Semantics and sensitivity measures. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. 805--816. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Robert Endre Tarjan. 1979. Applications of path compression on balanced trees. J. ACM 26, 4 (1979), 690--715. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Quoc Trung Tran and Chee-Yong Chan. 2010. How to ConQueR why-not questions. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 15--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Quoc Trung Tran, Chee-Yong Chan, and Srinivasan Parthasarathy. 2009. Query by output. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. 535--548. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Brett Walenz and Jun Yang. 2016. Perturbation analysis of database queries. Proc. VLDB Endow 9, 14 (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Eugene Wu and Samuel Madden. 2013. Scorpion: Explaining away outliers in aggregate queries. Proc. VLDB Endow. 6, 8 (June 2013), 553--564. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. You Wu, Pankaj K. Agarwal, Chengkai Li, Jun Yang, and Cong Yu. 2012. On “one of the few” objects. In Proceedings of the 2012 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1487--1495. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. You Wu, Brett Walenz, Peggy Li, Andrew Shim, Emre Sonmez, Pankaj K. Agarwal, Chengkai Li, Jun Yang, and Cong Yu. 2014. iCheck: Computationally combating lies, d--ned lies, and statistics. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, 1063--1066. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Yusuke Yamamoto and Katsumi Tanaka. 2009. Finding comparative facts and aspects for judging the credibility of uncertain facts. In Proceedings of the 2009 International Conference on Web Information Systems Engineering. 291--305. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Yusuke Yamamoto, Taro Tezuka, Adam Jatowt, and Katsumi Tanaka. 2008. Supporting judgment of fact trustworthiness considering temporal and sentimental aspects. In Proceedings of the 2008 International Conference on Web Information Systems Engineering. 206--220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Albert Yu, Pankaj K. Agarwal, and Jun Yang. 2012. Processing a large number of continuous preference top-k queries. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. 397--408. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Bo Zhao, Benjamin I. P. Rubinstein, Jim Gemmell, and Jiawei Han. 2012. A Bayesian approach to discovering truth from conflicting sources for data integration. Proc. VLDB Endow. 5, 6 (2012), 550--561. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Computational Fact Checking through Query Perturbations

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Database Systems
      ACM Transactions on Database Systems  Volume 42, Issue 1
      Invited Paper from ICDT 2014, Invited Paper from EDBT 2015, Regular Papers and Technical Correspondence
      March 2017
      263 pages
      ISSN:0362-5915
      EISSN:1557-4644
      DOI:10.1145/3015779
      Issue’s Table of Contents

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 January 2017
      • Accepted: 1 September 2016
      • Revised: 1 May 2016
      • Received: 1 June 2015
      Published in tods Volume 42, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader