research-article

Public Access

Computational Fact Checking through Query Perturbations

Authors:
You Wu

Duke University

Duke University

0000-0002-0331-4975
View Profile

,
Pankaj K. Agarwal

Duke University, NC, USA

Duke University, NC, USA
View Profile

,
Chengkai Li

University of Texas at Arlington, Arlington, TX

University of Texas at Arlington, Arlington, TX
View Profile

,
Jun Yang

Duke University, NC, USA

Duke University, NC, USA
View Profile

,
Cong Yu

Google Research, New York, NY, USA

Google Research, New York, NY, USA
View Profile

Authors Info & Claims

ACM Transactions on Database Systems Volume 42 Issue 1Article No.: 4pp 1–41https://doi.org/10.1145/2996453

Published:09 January 2017Publication History

ACM Transactions on Database Systems

Abstract

Our media is saturated with claims of “facts” made from data. Database research has in the past focused on how to answer queries, but has not devoted much attention to discerning more subtle qualities of the resulting claims, for example, is a claim “cherry-picking”? This article proposes a framework that models claims based on structured data as parameterized queries. Intuitively, with its choice of the parameter setting, a claim presents a particular (and potentially biased) view of the underlying data. A key insight is that we can learn a lot about a claim by “perturbing” its parameters and seeing how its conclusion changes. For example, a claim is not robust if small perturbations to its parameters can change its conclusions significantly. This framework allows us to formulate practical fact-checking tasks—reverse-engineering vague claims, and countering questionable claims—as computational problems. Along with the modeling framework, we develop an algorithmic framework that enables efficient instantiations of “meta” algorithms by supplying appropriate algorithmic building blocks. We present real-world examples and experiments that demonstrate the power of our model, efficiency of our algorithms, and usefulness of their results.

References

Charu C. Aggarwal (Ed.). 2009. Managing and Mining Uncertain Data. Springer. Google ScholarDigital Library
Raju Balakrishnan and Subbarao Kambhampati. 2011. SourceRank: Relevance and trust assessment for deep web sources based on inter-source agreement. In Proceedings of the 2011 International Conference on World Wide Web. 227--236. Google ScholarDigital Library
Philip A. Bernstein and Laura M. Haas. 2008. Information integration in the enterprise. Commun. ACM 51, 9 (2008), 72--79. Google ScholarDigital Library
Stephan Börzsönyi, Donald Kossmann, and Konrad Stocker. 2001. The skyline operator. In Proceedings of the 2001 International Conference on Data Engineering. 421--430. Google ScholarDigital Library
Christian Buchta. 1989. On the average number of maxima in a set of vectors. Inform. Process. Lett. 33, 2 (1989), 63--65. Google ScholarDigital Library
Surajit Chaudhuri. 1990. Generalization and a framework for query modification. In Proceedings of the 6th International Conference on Data Engineering, 1990. IEEE, 138--145. Google ScholarDigital Library
Bernard Chazelle. 1988. A functional approach to data structures and its use in multidimensional searching. SIAM J. Comput. 17, 3 (1988), 427--462. Google ScholarDigital Library
Wesley W. Chu, Qiming Chen, and Rei-Chi Lee. 1991. Cooperative Query Answering via Type Abstraction Hierarchy. Springer. Google ScholarCross Ref
Sarah Cohen, James T. Hamilton, and Fred Turner. 2011a. Computational journalism. Commun. ACM 54, 10 (2011), 66--71. Google ScholarDigital Library
Sarah Cohen, Chengkai Li, Jun Yang, and Cong Yu. 2011b. Computational journalism: A call to arms to database researchers. In Proceedings of the 2011 Conference on Innovative Data Systems Research. Google ScholarDigital Library
Harish D., Pooja N. Darera, and Jayant R. Haritsa. 2008. Identifying robust plans through plan diagram reduction. In Proceedings of the 2008 International Conference on Very Large Data Bases. 1124--1140.Google Scholar
Nilesh N. Dalvi, Christopher Ré, and Dan Suciu. 2009. Probabilistic databases: Diamonds in the dirt. Commun. ACM 52, 7 (2009), 86--94. Google ScholarDigital Library
Anish Das Sarma, Aditya G. Parameswaran, Hector Garcia-Molina, and Jennifer Widom. 2010. Synthesizing view definitions from data. In Proceedings of the 2010 International Conference on Database Theory. 89--103. Google ScholarDigital Library
Mark De Berg, Marc Van Kreveld, Mark Overmars, and Otfried Cheong Schwarzkopf. 2000. Computational Geometry. Springer. Google ScholarCross Ref
AnHai Doan, Alon Halevy, and Zachary Ives. 2012. Principles of Data Integration (1st ed.). Morgan Kaufmann. Google ScholarDigital Library
Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. 2009. Integrating conflicting data: The role of source dependence. Proc. VLDB Endow. 2, 1 (2009), 550--561. Google ScholarDigital Library
Ronald Fagin, Amnon Lotem, and Moni Naor. 2003. Optimal aggregation algorithms for middleware. J. Comput. System Sci. 66, 4 (2003), 614--656. Google ScholarDigital Library
Sumit Ganguly. 1998. Design and analysis of parametric query optimization algorithms. In Proceedings of the 1998 International Conference on Very Large Data Bases. 228--238. Google ScholarDigital Library
Jim Giles. 2012. Truth goggles. The New Scientist 2882 (Sept. 2012), 44--47. Google ScholarCross Ref
Jim Gray, Adam Bosworth, Andrew Layman, and Hamid Pirahesh. 1996. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-total. In Proceedings of the 1996 International Conference on Data Engineering. 152--159. Google ScholarDigital Library
Dov Harel and Robert E. Tarjan. 1984. Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13, 2 (1984), 338--355. Google ScholarDigital Library
Zhian He and Eric Lo. 2012. Answering why-not questions on top-k queries. In Proceedings of the 2012 International Conference on Data Engineering. 750--761. Google ScholarDigital Library
Soon-Young Huh, Kae-Hyun Moon, and Hee-Seok Lee. 2000. A data abstraction approach for query relaxation. Inf. Softw. Technol. 42, 6 (2000), 407--418. Google ScholarCross Ref
Arvind Hulgeri and S. Sudarshan. 2003. AniPQO: Almost non-intrusive parametric query optimization for nonlinear cost functions. In Proceedings of the 2003 International Conference on Very Large Data Bases. 766--777. Google ScholarDigital Library
Yannis E. Ioannidis, Raymond T. Ng, Kyuseok Shim, and Timos K. Sellis. 1992. Parametric query optimization. In Proceedings of the 1992 International Conference on Very Large Data Bases. 103--114. Google ScholarDigital Library
Ravi Jampani, Fei Xu, Mingxi Wu, Luis Leopoldo Perez, Chris Jermaine, and Peter J. Haas. 2011. The Monte Carlo database system: Stochastic analysis close to the data. ACM Trans. Database Syst. 36, 3 (2011), 18. Google ScholarDigital Library
Christian S. Jensen and Richard Snodgrass. 1994. Temporal specialization and generalization. IEEE Trans. Knowl. Data Eng. 6, 6 (1994), 954--974. Google ScholarDigital Library
Jia-Ling Koh, Kuang-Ting Chiang, and I.-Chih Chiu. 2013. The strategies for supporting query specialization and query generalization in social tagging systems. In Database Systems for Advanced Applications. Springer, 164--178. Google ScholarDigital Library
Hsiang-Tsung Kung, Fabrizio Luccio, and Franco P. Preparata. 1975. On finding the maxima of a set of vectors. J. ACM 22, 4 (1975), 469--476. Google ScholarDigital Library
Xian Li, Weiyi Meng, and Clement T. Yu. 2011. T.-verifier: Verifying truthfulness of fact statements. In Proceedings of the 2011 International Conference on Data Engineering. 63--74. Google ScholarDigital Library
Yunyao Li, Ishan Chaudhuri, Huahai Yang, Satinder Singh, and H. V. Jagadish. 2007. DaNaLIX: A domain-adaptive natural language interface for querying XML. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. 1165--1168. Google ScholarDigital Library
Yunyao Li, Huahai Yang, and H. V. Jagadish. 2006. Constructing a generic natural language interface for an XML database. In Proceedings of the 2006 International Conference on Extending Database Technology. 737--754. Google ScholarDigital Library
Xika Lin, Abhishek Mukherji, Elke A. Rundensteiner, Carolina Ruiz, and Matthew O. Ward. 2013. PARAS: A parameter space framework for online association mining. Proc. VLDB Endow. 6, 3 (2013), 193--204. Google ScholarDigital Library
Kurt Mehlhorn and Stefan Näher. 1990. Dynamic fractional cascading. Algorithmica 5, 1--4 (1990), 215--241.Google ScholarCross Ref
Kyriakos Mouratidis and HweeHwa Pang. 2012. Computing immutable regions for subspace top-k queries. Proc.VLDB Endow. 6, 2 (2012), 73--84. Google ScholarDigital Library
Ana-Maria Popescu, Oren Etzioni, and Henry A. Kautz. 2003. Towards a theory of natural language interfaces to databases. In Proceedings of the 2003 International Conference on Intelligent User Interfaces. 149--157. Google ScholarDigital Library
Alexander J. Quinn and Benjamin B. Bederson. 2011. Human computation: A survey and taxonomy of a growing field. In Proceedings of the 2011 International Conference on Human Factors in Computing Systems. 1403--1412. Google ScholarDigital Library
Sudeepa Roy and Dan Suciu. 2014. A formal approach to finding explanations for database queries. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 1579--1590. Google ScholarDigital Library
Mohamed A. Soliman, Ihab F. Ilyas, Davide Martinenghi, and Marco Tagliasacchi. 2011. Ranking with uncertain scoring functions: Semantics and sensitivity measures. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. 805--816. Google ScholarDigital Library
Robert Endre Tarjan. 1979. Applications of path compression on balanced trees. J. ACM 26, 4 (1979), 690--715. Google ScholarDigital Library
Quoc Trung Tran and Chee-Yong Chan. 2010. How to ConQueR why-not questions. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 15--26. Google ScholarDigital Library
Quoc Trung Tran, Chee-Yong Chan, and Srinivasan Parthasarathy. 2009. Query by output. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. 535--548. Google ScholarDigital Library
Brett Walenz and Jun Yang. 2016. Perturbation analysis of database queries. Proc. VLDB Endow 9, 14 (2016). Google ScholarDigital Library
Eugene Wu and Samuel Madden. 2013. Scorpion: Explaining away outliers in aggregate queries. Proc. VLDB Endow. 6, 8 (June 2013), 553--564. Google ScholarDigital Library
You Wu, Pankaj K. Agarwal, Chengkai Li, Jun Yang, and Cong Yu. 2012. On “one of the few” objects. In Proceedings of the 2012 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1487--1495. Google ScholarDigital Library
You Wu, Brett Walenz, Peggy Li, Andrew Shim, Emre Sonmez, Pankaj K. Agarwal, Chengkai Li, Jun Yang, and Cong Yu. 2014. iCheck: Computationally combating lies, d--ned lies, and statistics. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, 1063--1066. Google ScholarDigital Library
Yusuke Yamamoto and Katsumi Tanaka. 2009. Finding comparative facts and aspects for judging the credibility of uncertain facts. In Proceedings of the 2009 International Conference on Web Information Systems Engineering. 291--305. Google ScholarDigital Library
Yusuke Yamamoto, Taro Tezuka, Adam Jatowt, and Katsumi Tanaka. 2008. Supporting judgment of fact trustworthiness considering temporal and sentimental aspects. In Proceedings of the 2008 International Conference on Web Information Systems Engineering. 206--220. Google ScholarDigital Library
Albert Yu, Pankaj K. Agarwal, and Jun Yang. 2012. Processing a large number of continuous preference top-k queries. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. 397--408. Google ScholarDigital Library
Bo Zhao, Benjamin I. P. Rubinstein, Jim Gemmell, and Jiawei Han. 2012. A Bayesian approach to discovering truth from conflicting sources for data integration. Proc. VLDB Endow. 5, 6 (2012), 550--561. Google ScholarDigital Library

Index Terms

Computational Fact Checking through Query Perturbations
1. Information systems
  1. Information systems applications

Recommendations

Toward Automated Fact-Checking: Detecting Check-worthy Factual Claims by ClaimBuster
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

This paper introduces how ClaimBuster, a fact-checking platform, uses natural language processing and supervised learning to detect important factual claims in political discourses. The claim spotting model is built using a human-labeled dataset of ...
Read More
Toward computational fact-checking

Our news are saturated with claims of "facts" made from data. Database research has in the past focused on how to answer queries, but has not devoted much attention to discerning more subtle qualities of the resulting claims, e.g., is a claim "cherry-...
Read More
Detecting Check-worthy Factual Claims in Presidential Debates
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

Public figures such as politicians make claims about "facts" all the time. Journalists and citizens spend a good amount of time checking the veracity of such claims. Toward automatic fact checking, we developed tools to find check-worthy factual claims ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Database Systems Volume 42, Issue 1
Invited Paper from ICDT 2014, Invited Paper from EDBT 2015, Regular Papers and Technical Correspondence
March 2017
263 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/3015779
Editor:
Christian S. Jensen
Aalborg University, Denmark
Issue’s Table of Contents
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 January 2017
- Accepted: 1 September 2016
- Revised: 1 May 2016
- Received: 1 June 2015
Published in tods Volume 42, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Sensitivity analysis
computational journalism
fact checking
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 21
  Total Citations
  View Citations
- 1,066
  Total Downloads
- Downloads (Last 12 months)93
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Computational Fact Checking through Query Perturbations

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

Toward Automated Fact-Checking: Detecting Check-worthy Factual Claims by ClaimBuster

Toward computational fact-checking

Detecting Check-worthy Factual Claims in Presidential Debates

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Computational Fact Checking through Query Perturbations

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

Toward Automated Fact-Checking: Detecting Check-worthy Factual Claims by ClaimBuster

Toward computational fact-checking

Detecting Check-worthy Factual Claims in Presidential Debates

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media