Abstract
Our media is saturated with claims of “facts” made from data. Database research has in the past focused on how to answer queries, but has not devoted much attention to discerning more subtle qualities of the resulting claims, for example, is a claim “cherry-picking”? This article proposes a framework that models claims based on structured data as parameterized queries. Intuitively, with its choice of the parameter setting, a claim presents a particular (and potentially biased) view of the underlying data. A key insight is that we can learn a lot about a claim by “perturbing” its parameters and seeing how its conclusion changes. For example, a claim is not robust if small perturbations to its parameters can change its conclusions significantly. This framework allows us to formulate practical fact-checking tasks—reverse-engineering vague claims, and countering questionable claims—as computational problems. Along with the modeling framework, we develop an algorithmic framework that enables efficient instantiations of “meta” algorithms by supplying appropriate algorithmic building blocks. We present real-world examples and experiments that demonstrate the power of our model, efficiency of our algorithms, and usefulness of their results.
- Charu C. Aggarwal (Ed.). 2009. Managing and Mining Uncertain Data. Springer. Google ScholarDigital Library
- Raju Balakrishnan and Subbarao Kambhampati. 2011. SourceRank: Relevance and trust assessment for deep web sources based on inter-source agreement. In Proceedings of the 2011 International Conference on World Wide Web. 227--236. Google ScholarDigital Library
- Philip A. Bernstein and Laura M. Haas. 2008. Information integration in the enterprise. Commun. ACM 51, 9 (2008), 72--79. Google ScholarDigital Library
- Stephan Börzsönyi, Donald Kossmann, and Konrad Stocker. 2001. The skyline operator. In Proceedings of the 2001 International Conference on Data Engineering. 421--430. Google ScholarDigital Library
- Christian Buchta. 1989. On the average number of maxima in a set of vectors. Inform. Process. Lett. 33, 2 (1989), 63--65. Google ScholarDigital Library
- Surajit Chaudhuri. 1990. Generalization and a framework for query modification. In Proceedings of the 6th International Conference on Data Engineering, 1990. IEEE, 138--145. Google ScholarDigital Library
- Bernard Chazelle. 1988. A functional approach to data structures and its use in multidimensional searching. SIAM J. Comput. 17, 3 (1988), 427--462. Google ScholarDigital Library
- Wesley W. Chu, Qiming Chen, and Rei-Chi Lee. 1991. Cooperative Query Answering via Type Abstraction Hierarchy. Springer. Google ScholarCross Ref
- Sarah Cohen, James T. Hamilton, and Fred Turner. 2011a. Computational journalism. Commun. ACM 54, 10 (2011), 66--71. Google ScholarDigital Library
- Sarah Cohen, Chengkai Li, Jun Yang, and Cong Yu. 2011b. Computational journalism: A call to arms to database researchers. In Proceedings of the 2011 Conference on Innovative Data Systems Research. Google ScholarDigital Library
- Harish D., Pooja N. Darera, and Jayant R. Haritsa. 2008. Identifying robust plans through plan diagram reduction. In Proceedings of the 2008 International Conference on Very Large Data Bases. 1124--1140.Google Scholar
- Nilesh N. Dalvi, Christopher Ré, and Dan Suciu. 2009. Probabilistic databases: Diamonds in the dirt. Commun. ACM 52, 7 (2009), 86--94. Google ScholarDigital Library
- Anish Das Sarma, Aditya G. Parameswaran, Hector Garcia-Molina, and Jennifer Widom. 2010. Synthesizing view definitions from data. In Proceedings of the 2010 International Conference on Database Theory. 89--103. Google ScholarDigital Library
- Mark De Berg, Marc Van Kreveld, Mark Overmars, and Otfried Cheong Schwarzkopf. 2000. Computational Geometry. Springer. Google ScholarCross Ref
- AnHai Doan, Alon Halevy, and Zachary Ives. 2012. Principles of Data Integration (1st ed.). Morgan Kaufmann. Google ScholarDigital Library
- Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. 2009. Integrating conflicting data: The role of source dependence. Proc. VLDB Endow. 2, 1 (2009), 550--561. Google ScholarDigital Library
- Ronald Fagin, Amnon Lotem, and Moni Naor. 2003. Optimal aggregation algorithms for middleware. J. Comput. System Sci. 66, 4 (2003), 614--656. Google ScholarDigital Library
- Sumit Ganguly. 1998. Design and analysis of parametric query optimization algorithms. In Proceedings of the 1998 International Conference on Very Large Data Bases. 228--238. Google ScholarDigital Library
- Jim Giles. 2012. Truth goggles. The New Scientist 2882 (Sept. 2012), 44--47. Google ScholarCross Ref
- Jim Gray, Adam Bosworth, Andrew Layman, and Hamid Pirahesh. 1996. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-total. In Proceedings of the 1996 International Conference on Data Engineering. 152--159. Google ScholarDigital Library
- Dov Harel and Robert E. Tarjan. 1984. Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13, 2 (1984), 338--355. Google ScholarDigital Library
- Zhian He and Eric Lo. 2012. Answering why-not questions on top-k queries. In Proceedings of the 2012 International Conference on Data Engineering. 750--761. Google ScholarDigital Library
- Soon-Young Huh, Kae-Hyun Moon, and Hee-Seok Lee. 2000. A data abstraction approach for query relaxation. Inf. Softw. Technol. 42, 6 (2000), 407--418. Google ScholarCross Ref
- Arvind Hulgeri and S. Sudarshan. 2003. AniPQO: Almost non-intrusive parametric query optimization for nonlinear cost functions. In Proceedings of the 2003 International Conference on Very Large Data Bases. 766--777. Google ScholarDigital Library
- Yannis E. Ioannidis, Raymond T. Ng, Kyuseok Shim, and Timos K. Sellis. 1992. Parametric query optimization. In Proceedings of the 1992 International Conference on Very Large Data Bases. 103--114. Google ScholarDigital Library
- Ravi Jampani, Fei Xu, Mingxi Wu, Luis Leopoldo Perez, Chris Jermaine, and Peter J. Haas. 2011. The Monte Carlo database system: Stochastic analysis close to the data. ACM Trans. Database Syst. 36, 3 (2011), 18. Google ScholarDigital Library
- Christian S. Jensen and Richard Snodgrass. 1994. Temporal specialization and generalization. IEEE Trans. Knowl. Data Eng. 6, 6 (1994), 954--974. Google ScholarDigital Library
- Jia-Ling Koh, Kuang-Ting Chiang, and I.-Chih Chiu. 2013. The strategies for supporting query specialization and query generalization in social tagging systems. In Database Systems for Advanced Applications. Springer, 164--178. Google ScholarDigital Library
- Hsiang-Tsung Kung, Fabrizio Luccio, and Franco P. Preparata. 1975. On finding the maxima of a set of vectors. J. ACM 22, 4 (1975), 469--476. Google ScholarDigital Library
- Xian Li, Weiyi Meng, and Clement T. Yu. 2011. T.-verifier: Verifying truthfulness of fact statements. In Proceedings of the 2011 International Conference on Data Engineering. 63--74. Google ScholarDigital Library
- Yunyao Li, Ishan Chaudhuri, Huahai Yang, Satinder Singh, and H. V. Jagadish. 2007. DaNaLIX: A domain-adaptive natural language interface for querying XML. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. 1165--1168. Google ScholarDigital Library
- Yunyao Li, Huahai Yang, and H. V. Jagadish. 2006. Constructing a generic natural language interface for an XML database. In Proceedings of the 2006 International Conference on Extending Database Technology. 737--754. Google ScholarDigital Library
- Xika Lin, Abhishek Mukherji, Elke A. Rundensteiner, Carolina Ruiz, and Matthew O. Ward. 2013. PARAS: A parameter space framework for online association mining. Proc. VLDB Endow. 6, 3 (2013), 193--204. Google ScholarDigital Library
- Kurt Mehlhorn and Stefan Näher. 1990. Dynamic fractional cascading. Algorithmica 5, 1--4 (1990), 215--241.Google ScholarCross Ref
- Kyriakos Mouratidis and HweeHwa Pang. 2012. Computing immutable regions for subspace top-k queries. Proc.VLDB Endow. 6, 2 (2012), 73--84. Google ScholarDigital Library
- Ana-Maria Popescu, Oren Etzioni, and Henry A. Kautz. 2003. Towards a theory of natural language interfaces to databases. In Proceedings of the 2003 International Conference on Intelligent User Interfaces. 149--157. Google ScholarDigital Library
- Alexander J. Quinn and Benjamin B. Bederson. 2011. Human computation: A survey and taxonomy of a growing field. In Proceedings of the 2011 International Conference on Human Factors in Computing Systems. 1403--1412. Google ScholarDigital Library
- Sudeepa Roy and Dan Suciu. 2014. A formal approach to finding explanations for database queries. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 1579--1590. Google ScholarDigital Library
- Mohamed A. Soliman, Ihab F. Ilyas, Davide Martinenghi, and Marco Tagliasacchi. 2011. Ranking with uncertain scoring functions: Semantics and sensitivity measures. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. 805--816. Google ScholarDigital Library
- Robert Endre Tarjan. 1979. Applications of path compression on balanced trees. J. ACM 26, 4 (1979), 690--715. Google ScholarDigital Library
- Quoc Trung Tran and Chee-Yong Chan. 2010. How to ConQueR why-not questions. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 15--26. Google ScholarDigital Library
- Quoc Trung Tran, Chee-Yong Chan, and Srinivasan Parthasarathy. 2009. Query by output. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. 535--548. Google ScholarDigital Library
- Brett Walenz and Jun Yang. 2016. Perturbation analysis of database queries. Proc. VLDB Endow 9, 14 (2016). Google ScholarDigital Library
- Eugene Wu and Samuel Madden. 2013. Scorpion: Explaining away outliers in aggregate queries. Proc. VLDB Endow. 6, 8 (June 2013), 553--564. Google ScholarDigital Library
- You Wu, Pankaj K. Agarwal, Chengkai Li, Jun Yang, and Cong Yu. 2012. On “one of the few” objects. In Proceedings of the 2012 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1487--1495. Google ScholarDigital Library
- You Wu, Brett Walenz, Peggy Li, Andrew Shim, Emre Sonmez, Pankaj K. Agarwal, Chengkai Li, Jun Yang, and Cong Yu. 2014. iCheck: Computationally combating lies, d--ned lies, and statistics. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, 1063--1066. Google ScholarDigital Library
- Yusuke Yamamoto and Katsumi Tanaka. 2009. Finding comparative facts and aspects for judging the credibility of uncertain facts. In Proceedings of the 2009 International Conference on Web Information Systems Engineering. 291--305. Google ScholarDigital Library
- Yusuke Yamamoto, Taro Tezuka, Adam Jatowt, and Katsumi Tanaka. 2008. Supporting judgment of fact trustworthiness considering temporal and sentimental aspects. In Proceedings of the 2008 International Conference on Web Information Systems Engineering. 206--220. Google ScholarDigital Library
- Albert Yu, Pankaj K. Agarwal, and Jun Yang. 2012. Processing a large number of continuous preference top-k queries. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. 397--408. Google ScholarDigital Library
- Bo Zhao, Benjamin I. P. Rubinstein, Jim Gemmell, and Jiawei Han. 2012. A Bayesian approach to discovering truth from conflicting sources for data integration. Proc. VLDB Endow. 5, 6 (2012), 550--561. Google ScholarDigital Library
Index Terms
- Computational Fact Checking through Query Perturbations
Recommendations
Toward Automated Fact-Checking: Detecting Check-worthy Factual Claims by ClaimBuster
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningThis paper introduces how ClaimBuster, a fact-checking platform, uses natural language processing and supervised learning to detect important factual claims in political discourses. The claim spotting model is built using a human-labeled dataset of ...
Toward computational fact-checking
Our news are saturated with claims of "facts" made from data. Database research has in the past focused on how to answer queries, but has not devoted much attention to discerning more subtle qualities of the resulting claims, e.g., is a claim "cherry-...
Detecting Check-worthy Factual Claims in Presidential Debates
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge ManagementPublic figures such as politicians make claims about "facts" all the time. Journalists and citizens spend a good amount of time checking the veracity of such claims. Toward automatic fact checking, we developed tools to find check-worthy factual claims ...
Comments